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FUNDAMENTAL CONCEPTS 


An abiding faith in the importance of understanding the principles 
which govern human behavior underlies all psychological investiga- 
tion and study. The study of personality, like the study of any other 
psychological discipline, is pursued with the thought that it will 
ultimately lead to the greater understanding of the forces that con- 
trol human behavior. 

In this textbook we shall attempt to gain insight into the meaning 
of personality and to gain an appreciation of the value of personality 
as an explanatory principle in human behavior. We shall do this by 
making a detailed study of the major methods used in the measure- 
ment of personality. If we can accept Lord Kelvin’s dictum: that 
whatever exists can be measured, and if we can find a successful 
“measurer” of personality, we shall then surely know what per- 


sonality is. 


OBJECTIVES OF PERSONALITY MEASUREMENT 


There are three objectives to be gained by the measurement of 
personality. They are the better understanding of individual be- 
havior, the better understanding of group behavior, and the better 
understanding of the interactions between individual and group 
behavior. Traditionally, the first of these objectives belongs to 
psychology, the second to sociology, and the third to that hybrid 
discipline, social psychology. We shall not concern ourselves with 
this tripartite division, however, as it adds nothing of significance 
for our purposes. We shall be concerned in this volume with all useful 
ality measurement; and unless it is necessary for 


methods of person 
g of them, we shall not concern ourselves 


our better understandin 


with their academic origins. 
1 


2 Personality Measurement 


Individual Behavior. Our first objective is that of understanding 
individual behavior. By this statement we mean to imply that we 
should like to be able to describe, predict, and control the behavior 
of individuals. We should like to be able to tell what personality 
characteristics an individual possesses, to tell what types of behavior 
these personality characteristics imply, and, in part at least, to have 
some measure of control over the occurrence or nonoccurrence of 
this implied behavior. 

Let us understand at once that the control of human behavior, in 
the sense herein intended, implies in no way fhe type of control 
involved in autocracy, despotism, or statism. It can at times mean 
police control but most frequently means control in the sense of our 
having some knowledge of possible outcomes. An example of the 
type of control intended can be seen in the field of astronomy. The 
astronomer in no way forces an eclipse to occur or prevents one from 
occurring. But he alters his own behavior in accord with his knowl- 
edge concerning the time and place of the event in question. In 
brief, the astronomer can describe an eclipse, he can understand the 
meaning of an eclipse, he can predict its occurrence, and he can, in 
the light of this knowledge, exercise a type of control that enables 
him to make use of such an event when it occurs, 

It is in a similar sense that control should be understood in regard 
to human behavior. Certainly no psychologist—to the 
knowledge—wishes to emulate Hitler in controlling the 
other individuals. The psychologist is interested, however 
in the sense of being able to take 
behavior of other individuals. 

Criminal Tendency. An illuminating example showing wherein 
knowledge of personality can be useful in describing, understanding, 
predicting, and controlling individual human behavior is given by 
Dr. Maud A. Merrill in her readable volume Problems of Child 
Delinquency. One of her case histories describes in careful detail the 
development of a criminal career. It is as follows: 


author’s 
actions of 
, in control 
proper action in the light of the 
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marks from the books and then sold them to buy candy. They were taken by the 
welfare officer to the probation office where they were talked to and then sent home. 
A plan was outlined at this time by which the boys were to be given small jobs like 
weeding gardens, so they could earn money for candy and amusements. 

Age 9 years, 3 months. Jerry and his brother stole two bicycles from a military 
academy. They were turned over to the city welfare officer and no report of the 
theft was made to the probation office. 

Age 9 years, 4 months. (First juvenile court contact, Judge A presiding.) Jerry and 
his brother broke into an oil station and stole four dollars from the cash register. 
They were certified to the juvenile court, made wards, and released to go home under 
a suspended threat of placement at St. G’s (Catholic) boarding school. 

Age 9 years, 5 months. (First twenty-four-hour school.) Jerry and another boy 
stole two bicycles, whereupon Jerry was placed, by the juvenile court, at St. G's 
parochial school. Jerry remained at St. G’s ten months and then returned home. 

Age 10 years, 7 mouths. (Training school.) Jerry again in company with his brother, 
Al, stole two bicycles from the same military academy which was the scene of their 
former exploit. Jerry was committed to W. State Training School, his brother re- 
leased to go home. He remained at the institution nine months and sixteen days and 
returned to his own home. 

Age 12 years, 6 months. (Recommitted as parole violator.) Jerry and his brother 
broke into a sporting goods store and stole ten dollars from the cash register. Jerry, 
on parole to the training school, was returned to the institution and Al was also sent 
to the same training school. This time Jerry remained at the institution another 
nine months before he was released. 

Age 14 years, 8 months. Jerry and his brother broke into a store and stole candy 
and gum. They were certified to the juvenile court and the parole officer from W. 
notified, but instead of returning the boys to the institution the parole officer re- 
leased them to their home. 

Age 15 years, 7 months. (Again committed as parole violator.) Jerry and his brother 
were apprehended after stealing many valuable articles such as a camera, case of 
surgical instruments, etc., from parked cars. At this time, the boys confessed to 
burglarizing two oil stations on the two previous nights. Jerry was returned to W. 
Al released to his home on probation (by Judge B). This time Jerry was released 
from W. after nine months and then returned home. 

Age 16 years, 7 months. (State Reformatory.) Jerry, unaccompanied by a com- 
panion, entered a dwelling and stole money and jewelry. He was committed by the 
juvenile court to the P. state reformatory where he served an eighteen-months’ 
sentence before he was released to return to his own home. 

Age 19 years, 9 months. (State prison.) Jerry stole an automobile, was charged 
with grand theft, tried by jury in superior court and acquitted. On his way home 
after the trial he stole another car. But this time he was promptly found guilty and 


sentenced to state’s prison. 
Dr. Merrill’s analysis of this case make 


behavior can, in large part, be explained and 
his personality characteristics. Jerry was 8 


s it clear that Jerry’s 
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brought to the clinic. It was immediately apparent to the counselor 
that his actions represented an attempt to satisfy the common child- 
hood cravings for playthings and sweets. These needs were not 
satisfied at home because of the poor economic circumstances of his 
family. His mother being dead, he received little supervision, and 
an older sister, who might have given proper guidance, was too busy 
combining her school- and housework. 

Jerry’s appearance was disarming and likable. Therefore, his 
teachers readily excused his misdemeanors. A similar situation 
prevailed at home. The adult members of his family accepted his 
behavior, rationalized it, and were more than willing to make ex- 
cuses for his conduct. Throughout his childhood, Jerry had an 
ability to “smile his way out of situations.” As time went on, he 
became more and more adept in this technique. The attitudes of the 
adults with whom he came in contact made his actions socially 
acceptable. 

Had there been early in Jerry’s life a proper understanding of 
his personality and an adequate prognosis of the types of behavior 
into which they might later, as they did in fact, lead, it might have 
been possible for the training officers to have exerted a greater 
degree of control over Jerry’s behavior. And if this had been the 


case, it is entirely possible that one less criminal career might now 
be on record. 


Mental Maladjustment. A second exam ple show 
knowledge of personality might have resulted in 
given by Farnsworth and Ferguson. They describe the case of a 
suicide involving marked and measurable changes in personality. 

B, a college student, was, along with other members of his fresh- 
man class, asked to take the Bernreuter P 
be described in Chap. 7). One year later, when a sophomore, B, along 
with many of his sophomore classmates, took the Bernreuter test a 
second time and shortly thereafter committed suicide. On the first 
testing, when B was a freshman, he received scores commonly inter- 
preted as indicating above-average degrees of self-sufficiency and 
self-confidence, an average degree of emotional stability, 


ing wherein greater 
beneficial control is 


ersonality Inventory (to 


Fundamental Concepts 5 


self-sufficient, more nonsocial, and in possessing more than an aver- 
age degree of self-confidence. 

When tested as a sophomore, B secured scores which are com- 
monly interpreted to mean serious emotional instability, lack of 
self-confidence, a low degree of sociability, marked submissiveness, 
and decided introversion. Scores on self-sufficiency showed no 
change. The authors state: 


Originally, B denied . . . that he had ever crossed the street to avoid meeting 
some person; that he had been troubled with shyness; that he frequently felt 
grouchy; that he experienced many pleasant or unpleasant moods or periods of 
loneliness even in the company of other people; that he found it difficult to speak in 
public; that he lacked self-confidence; that he allowed particularly useless or 
bothersome thoughts to come into his mind; that his mind often wandered so badly 
that he lost track of what he was doing; that his feelings alternated between happi- 
ness and sadness without apparent reason; that he felt that marriage was essential 
to his present or future happiness; or that he was usually considered to be indiffer- 
ent to the opposite sex. On the retest he admitted all of these things. Originally he 
professed to believe that he made new friends easily, but on the retest blank he 
asserted he found this difficult. 


The numerical results of both the first and second testings, and 
the differences between them, are set forth in Table 1. The greatest 


Tare 1. Scores Made by Student B on the Bernreuter Personality Inventory* 


Percentile ranks 
Bernreuter scale First | Second | Change 
test | test | in rank 
BI-N (neuroticism).............-- 50 83 | 33 
B2-S (self-sufficiency) ..... | 85 87 | 2 
B3-I (introversion) 43 78 35 
B4-D (dominance) 55 25 —30 
F1-C (self-confidence)... 33 77 | H 
F2-S (sociability)... o. ou sa ss ca e ee 88 98 | 10 


* From Farnsworth, P. R., and Ferguson, L. W. The growth of a suicidal tendency as indi- 
cated by score changes in Bernreuter’s Personality Inventory. Sociometry, 1938, 1, 339-341. 


change is in self-confidence, which changes from the 33d percentile 
on the first test to the 77th percentile on the second test. The next 
greatest change comes in the increasing degree of introversion (35 
percentile points) or neuroticism (33 percentile points); and the 
third change, one of 30 percentile points, comes 1n the lower degree 
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of dominance in face-to-face situations. These changes are decidedly 
atypical when viewed against the background of changes for most 
of B’s classmates. The average intercorrelation between the scores 
on the first and second tests approximates .70; and only two of B's 
classmates, who scored as he did on the first test, exhibited similar 
changes. And in both of these instances, the changes that did occur 
were not of the same magnitude as those for B. 


It appears from these data that personality changes may have 
played an important part in B’s decision to commit suicide. There- 
fore, had appropriate information been in the hands of a counselor, 
it is possible that effective preventive steps might h 


ave been taken. 
In this case, there is a stron 


g suggestion that one testing of personal- 
ity may not always be sufficient for a complete understanding of 
individual behavior. The changes which took place in B over a 
year’s period of time seem to constitute significant data. The authors 
conclude, “From these data one could not have foretold the subse- 
quent tragedy; yet it would seem rather obvious that B needed 


counseling and perhaps psychiatric attention. 
cedure gives a dyn 


single test.” 


Marital Happiness. A third example of the way in which knowl- 
edge gained from the measurement of person 
to the understanding of individual behavior 
study of Psychological Factors in Marital H. 
goals in this study was to determine the ri 
personality, background, and sex 
happiness. Personality factors wer 
the Bernreuter Personalit 


Interest Test. Along with 


... The retest pro- 
amic picture which can never be attained by a 


ality may contribute 
is given in Terman’s 
appiness. One of the 
elative importance of 
-adjustment factors in marital 
€ assessed primarily by means of 
y Inventory and the Strong Vocational 
background and sex-adjustment factors, 
personality factors were found to be important determiners of 
marital happiness. Some of the pertinent correlations which Terman 
Feports are presented in Table 2. In view of the 
need be no question that the understanding of 


to the better prediction and control of the beh 
in marriage. 


se correlations there 
personality can lead 
avior to be expected 


concept, or the example just given wil 
of Terman’s and other investigators’ 
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contend that it is foolish to think that two people in love would 
submit themselves to a premarital happiness test and would dissolve 
their intended partnership if the scores secured were not entirely 
satisfactory. Granted! But the critic who stops here has an entirely 
inadequate concept of our ideas concerning the control of behavior. 
In the instance under consideration, control is exercised in our 
possession of the knowledge of factors which may detract from an 
optimum state of marital happiness. For example, one of Dr. Ter. 
man’s findings was that differences in mental ability, particularly 


Tase 2. Marital Happiness in Relation to Personality, Background, and Sex 
Adjustment* 


Factor Husband | Wife 
Personality. vcs ivs.ats. cm g as .47 46 
Background.. 35 29 
Sex adjustment. 49 .49 

Total. 59 | 57 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


when the wife was the superior partner, frequently proved a handi- 
cap in the achievement of an optimum degree of marital happiness. 
Now if both partners can be made aware of this possibility, this 
factor can be controlled. The partners can take steps, not to remove 
the mental inequality, but to see that this factor does not prevent 
their achieving marital happiness. Briefly, the mentally superior 
partner will have to give up any expectation of equal participation 
by the inferior spouse in many intellectual pursuits. Likewise, the 
mentally inferior partner will have to give up any expectation of 
attempting to be an equal of the superior spouse in many intellectual 
pursuits. Knowledge of the inequality and of the fact that it some- 
times creates trouble provides the basis for making suitable adjust- 
ments therefor. 

It is sometimes the case that religious differences make for unhap- 
Piness in marriage. This does not mean that two individuals of 
differing religious faiths should not marry each other. But it does 
mean that they should be aware of the possible effects of such dif- 
ferences in their religious views, so that effective countermeasures 
can be taken. Just as the astronomer does, we can adjust ourselves 
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and our behavior to anticipated outcomes and can exert control in 


making suitable adaptations to the phenomena or behavior in 
question. 


Vocational Success. Our fourth and last example showing that 
knowledge of personality is an important factor in the better UGD 
standing of individual behavior lies in the field of vocational selec- 
tion. It has for many years been suspected, and for a smaller number 
of years been definitely established, that personality factors loom 
large in the success of life insurance salesmen. Drs. Albert K. Kurtz 
and Arthur W. Kornhauser, among 
validation of a test known as the Aptitude Index, that a knowledge 
of the personality of a prospective life insurance salesman enables 
an employer to make a reasonably accurate estimate of an appli- 
cant’s chances for success in the life insurance business. By and 
large, applicants who secure above-average scores on the Aptitude 
Index tend to produce an above-average volume of business. And 
applicants who secure below-average scores on the Aptitude Index 


tend to produce a below-average volume of business. One set of data 
illustrating this point is given in Table 3, 


others, have shown, through the 


TABLE 3. Aptitude Index Scores and V, olume of Life Insurance Sold 
Score on Test Average Yearly Sales 

Above average... $69,000 

Average. 5 


Below average 


These data indicate that 
measured by the Aptitude 
control in the selection of 


^ proper understanding of personality, aS 
Index, can lead to an effective degree 0 
applicants for employment in the life 
xamples in other lines of business might 


ld be largely repetitious and would not 
scussion. . 


arly diagnosis of inci 


; aladjustment, 1” 
the discovery of factors related 


. . e 
appiness, and in th 
onal success. 
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deavor he finds that an understanding of personality can be of 
material aid. 

Group Behavior. Our second objective is to describe, understand, 
predict, and control the behavior of groups of individuals. We should 
like to know the predominant personality characteristics of the 
Chinese, thé Russians, the African Hottentot, the lower class, the 
college graduate, the adolescent in our society, the adolescent in 
other societies, the parent, the child, the employer, the employee, 
and so on. We should like to be able to say what types of group 
behavior and attitudes these predominating personality character- 
istics imply. And we should like to be able to exert some control, as 
we have twice defined it, over anticipated behavioral outcomes. 

There are to be found many examples of attempts to influence the 
behavior of groups of individuals. Not all these attempts have been 
based upon adequate knowledge of the personality characteristics 
involved, however, and were destined, therefore, for failure. 

Historically, we could cite numerous instances of one nationality 
group trying to influence other nationality groups, frequently with 
disastrous results, World War Il and its aftermath being our most 
recent example. Various church groups have sent missionaries 
throughout the world to convert the heathen. One of the most in- 
teresting recent accounts of such a venture is that given by E. Lucas 
Bridges in his volume Uttermost Part of the Earth. In this informative 
volume, Bridges describes how his father, William Bridges, an 
Englishman, set out to convert the Indians of Tierra del Fuego: 
the Ona, Aush, Yahgans, etc., to many of the ways of European 
civilization. . Fs 

Destruction of Morale. While a detailed examination of any one or 
more of the foregoing attempts to exert influence would prove highly 
Interesting, it will prove more profitable for us to examine an attempt 
that was based upon a more adequate knowledge of the psychological 
factors involved. During World War I there was set up in the Office 
of War Information a Foreign Morale Analysis Division. One of the 
major duties of this Division was to make an analysis of the status 
of Japanese morale, both military and civilian, and to suggest ways 
in which this morale could be affected—adversely for the Japanese, 
favorably for the Allies. The manner 1n which this work Lew con- 
ducted and a brief account of the results are given by Leighton in 


his book Human Relations in a Changing World. 
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When the Foreign Morale Analysis Division first tackled the 
problem which Leighton describes, Japanese military morale was 
exceedingly high, and there was no outward reason to believe that 
civilian morale was not equal thereto. Allied policymakers wished 
information which would show them why this morale had such 
strength, whether it had flaws, what changes (if any) could be 
expected, and what could be done to influence it in a direction favor- 
able to the Allied cause. 

To seek the answers to these questions the F 
sis Division developed a systematic metho 
classifying all available intelligence data. 
for examination were captured di 


oreign Morale Analy- 
d of examining and 
Primary sources available 
aries, letters and official documents, 
reports from neutral observers, Japanese newspapers and periodicals, 
radio broadcasts, and prisoner-of-war interrogation reports. Data 


from these sources were subjected to continuous analysis in accord 


with a specific and predetermined frame of reference. This frame of 
reference was designed to make possible the integration of the data 
received, so that they could be made useful as a basis for the formula- 
tion of methods which the Allies might employ to damage or to 
affect adversely Japanese civilian and military morale. 

The frame of reference within, and according to, which the Foreign 
Morale Analysis Division proceeded to make į 
by Leighton in a series of 14 theor 


was to gather information fi 


tom the sources ci 
data in the light of their relevance, according 


tions; to Prepare reports showing how 
Suggested particular courses of action; 
these courses of action would lower Ja 
the Division made several follow- 
results had been achieved. In this 
the correctness of some of the original a 


ted; to analyze these 
to the basic assump- 
the analyzed information 
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dividuals, rather than their destruction and disintegration, is to be 
found in the work of the U.S. Office of Indian Affairs. The adminis- 
tration of this bureau has not always been lily-white, but in its more 
recent years it has made serious and conscientious attempts to 
influence to their own advantage such Indian groups as the Navaho, 
the Hopi, the Aleuts, and the Eskimo. Without destroying the long- 
standing values in the cultures concerned, the Office of Indian 
Affairs is making serious effort to see that these and other Indian 
groups adapt themselves to such modes of the white man’s way of 
living as will ensure their economic and cultural survival. This 
requires real, earnest, and thorough knowledge of many charac- 
teristics—including personality characteristics—of the Indian groups 
concerned. 

We may cite as an outstanding example, in this connection, the 
work described by Kluckhohn and Leighton in their volumes Chi/- 
dren of the People and The Navaho. The former treats primarily of 
the individual and of the formation of his personality, while the 
latter is devoted chiefly to the situational and cultural aspects of 
Navaho life. In these volumes, Kluckhohn and Leighton sought “to 
investigate, analyze, and compare the development of personality 
in five Indian tribes in the context of their total environment— 
sociocultural, geographical,. and historical—for implications in 
regard to Indian Service Administration.” Kluckhohn and Leighton 


observe: 


The Navaho way of life may be learned only by knowing individual Navahos; 
conversely, Navaho personality may be fully understood only insofar as it is seen 
in relation to this lifeway and to the other factors in the environment in the widest 
sense. Understanding of Navaho culture is dependent upon acquaintance with 
personal figures, but equally these personal figures get their definition and organiza- 
tion as individuals when the student is in a position to contrast each one with the 
generalized background provided by the culture of The People. 


We cannot review here all of the significant problems discussed by 
Kluckhohn and Leighton. They cover much of Navaho history, 
analyze their language structure, review their relations with the 
white man, discuss economic factors in their gaining a livelihood, 
discuss their personal and social habits, and finally suggest a line of 
conduct for the Office of Indian Affairs. For our purposes it will be 
sufficient if we take note of a comparison of the marriage patterns 
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of the white person and the Navaho. The contrast in these two 
patterns will point up the difficult problem facing any white man 
who, with his background, sets out to understand the attitudes and 
behavior of the Navaho. Kluckhohn and Leighton state: 


1. For whites, marriage is an arrangement, economic and otherwise, between 
two individuals. The two spouses and the children, if any, are the ones primarily 
involved in any question of inheritance. 

For the Navaho, marriage is an arrangement between two f 
than it is between two individuals. 


2. For whites, a man’s recognized children, legitimate or illegitimate, have a 
claim upon his property. 


families much more 


For the Navaho, sexual rights are property rights. Therefore, if a man has children 
from a woman without undertaking during his lifetime the economic responsibilities 
which are normally a part of Navaho marrige, the children—however much he ad- 
mitted to biological fatherhood—were not really his: “He just stole them.” 


3. For whites, inheritance is normally from the father or from both sides of the 
family. 


For the Navaho, inheritance is normally from the mother, the mother’s brother, 
or other relatives of the mother; from the father’s side of the family little or nothing 
has traditionally been expected. 


4. For whites, as long as a wife or children survive, no other relatives are con- 
cerned in the inheritance, unless there was a will to that effect. 

For the Navaho, while children today, in most areas, expect to inherit something 
from their father, they do not expect to receive his whole estate or to divide it 
with their mother only. Sons and daughters have different expectations. 

5. For whites, all types of property are inherited in roughly the same w 

For the Navaho, different rules apply to different t 
is hardly heritable property at all; farm land normall 
has been cultivating it; livestock usually goes b; 
sisters and maternal nephews; jewelry and 


divided among the children and other relativ 
a son who is 


ay. 
ypes of property. Range land 
y stays with the family which 
ack, for the most part, to the father’s 
other personal property tend to be 
es; ceremonial equipment may go to 


a practitioner or to a clansman of the deceased. 

In these two patterns we see five important ways in which Nava- 
hos and whites differ from each other, and this in only one segment 
of the entire cultural complexes of the two peoples. Certainly we 
need not emphasize the obvious point that the understanding of 


Navaho personality is essential to the adequate description, per- 
diction, and control of Navaho behavior. 


Individual and Group Interaction. 
ing to measure or evaluate personalit 
describe, understand, predict, and con 
individual behavior and group behavi 


Our third objective in want- 
y is our wish to be able to 
trol the interactions between 
or. How does the behavior of 
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an individual affect that of the group? And how does group behavior 
affect that of the individual ? 

Effect of Group on Individual. Data illustrating the effect of the 
group upon individual behavior are provided by Kinsey, Pomeroy, 
and Martin in their much publicized volume Sexual Behavior in the 
Human Male. Kinsey, Pomeroy, and Martin show, among other 
things, that contact with animals as a male sexual outlet varies in 
frequency according to educational level. In a group of boys with 
rural backgrounds, Kinsey, Pomeroy, and Martin found those 
reporting the greatest frequency of contact with animals were boys 
who had completed 13 or more grades of school. The group reporting 
the next most frequent contact with animals were boys who had 
completed from 9 to 12 grades, and the group reporting the least 
frequent contact with animals were boys who had completed 8 
grades of school or less. The striking feature in this trend is that it 
becomes apparent, prior to the time it would be supposed, that much 
conscious thought could have been given to the amount of schooling 
to be obtained. There must be at work here some differential selec- 
tive factor that tends to segregate as one group both boys who will 
obtain the greatest amount of schooling and boys who will also have 
the greatest frequency of contact with animals. It does not appear 
unreasonable to assume that this subtle influence stems from the 
characteristics of the group in which the individual maintains, or 
attempts to maintain, his chief childhood status. 

Effect of Individual on Group. Examples showing the influence of 
the personality of one individual upon the characteristics of a group 
are far less numerous than those showing the effect of a group upon 
an individual. When they do occur, they are apt to be more striking, 
however, as indeed they must be in order to receive any notice. One 
of the most unfortunate examples of recent date is to be found in the 
person of Adolf Hitler and in his influence upon several generations 
of German youth. A much more pleasant example may be found in 
Maria, the famous potter of San Ildefonso. Almost singlehandedly, 
Maria changed for the better a very large segment of the cultural 
complex of her now famous Pueblo. 

Before we can describe how Maria exerted her influence, we must 
understand, as narrated by many authors, that the early Pueblo 
Indians in the American Southwest earned their livelihood primarily 
in an agricultural rather than in a hunting economy. Such an 
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economy provides, as we well know, certain off seasons of leisure 
time. The Pueblo Indians made use of this leisure time to develop 
a highly complex and most interesting civilization. If we can take 
note of just one aspect of this civilization—the art of pottery mak- 
ing—and briefly sketch its history, we will have the background for 
Marriott’s account of the story of Maria. ' 

Archeologists, in general, divide the cultural history of the 
Pueblos into seven great periods. The first of these is considered to 
extend from a.p. 100 to 500, the second from 500 to 700, the third 
from 700 to 900, the fourth from 900 to 1050, the fifth from 1050 to 
1300, the sixth from 1300 to 1700, and the seventh from 1700 to 
the present. In the very first of these periods, the Basket Maker 
period, pottery making was unknown. In the second period, the 
Pueblos invented or were told how to make very simple unfired 
pottery. From this simple beginning, through each succeeding 
period up to the fifth (the Classic period) the Pueblos increased 
both their technical and artistic skill in the art of pottery making. 
After this there began a decline, first in the loss of artistic skill 
and later in the loss of technical skill as well. When the Spaniards, 
under Coronado, invaded the Southwest in 1540, they initiated so 
many changes in Pueblo culture that ultimately the art of pottery 
making was lost completely. 

San Ildefonso, the particular village with which we are concerned, 
was founded, according to Martin, Quimby and Collier (see their 
book Indians before Columbus), about 1700, at the beginning of the 
period known as Pueblo V, the period of domination by Europeans. 
By 1915 San Ildefonso, because of Spanish, Mexican, and American 
domination, was an impoverished group with little reason for 
optimism regarding its future. Then Maria, by rediscovering the 
ancient art of pottery making and by her skill therein, was able to 
bring about a revivalistic movement of such proportions as to play 
an economically important part in the life of her people. 

Maria was born in 1881, or thereabouts, and when 14 or 15 years 
of age, she began to take a serious interest in pottery making, Re- 
ceiving encouragement from a group of archaeologists and help 
from her artistically capable husband, Maria was able to master the 
old craft of pottery making. Finding her pottery salable on the white 


market, she was able to find a solution to her own economic problems 
and then finally to those of her entire village. 


n 
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For some years prior to the advent of Maria’s pottery making, the 
population of the San Ildefonso Pueblo had declined from approxi- 
mately 150 to about 80. When Maria found the way out, the popula-. 
tion began to increase and now numbers, according to Marriott, 
more than 200. f 

Maria, a single person, and a woman of character, it might be 
added, had a marked and clear-cut effect upon, and changed by a 
Iture of her Pueblo village. She saved 


considerable extent, the cu 
her entire people from a short 


herself, her family, and undoubtedly 
road to oblivion. To understand and appreciate fully the part that 
Maria’s personality played in this transformation of a cultural 
complex one must read in full Alice Marriott’s account of Maria, 
the Potter of San Ildefonso. This book is a must for anyone interested 
in the effects which the personality of one individual can have upon 


the characteristics and way of life of a cultural group. 


ASSUMED NATURE OF PERSONALITY ORGANIZATION 


There have been many discussions designed to settle an old argu- 
ment as to whether personality traits are general or specific in 
nature. At one extreme we have Hartshorne and May, with their 
pioneering studies in character, arguing that most personality traits 
are specific to the stimulating situation. At the other extreme we 
have Gordon W. Allport arguing that in spite of great evidence of 
apparent specificity, there is an underlying generality in at least a 


great many personality traits. 


In personality research, just as 1 
sions are inextricably interwoven with the methods of study we 


employ. Consciously or unconsciously one research worker will find 
generality and another will find specificity, because of the research 
technique each has used. Let us carefully examine this specificity- 
generality continuum and see if we can get a clear view of some of the 
difficulties that seem to be involved. 

We may as well anticipate our results and state at the outset that 
the generality or specificity of a personality trait depends almost 
entirely upon the definition which the research worker formulates 
and very little upon the data which he collects. To illustrate let us 
briefly review one of the techniques used in measuring attitudes: the 
equal-appearing-interval method developed by Thurstone. In this 


n all other research, our conclu- 
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technique the principal measuring rod (as we shall explain in er 
4) consists of a series of simple declaratory statements. Each o 

these statements is taken to be indicative of some specified degree of 
favorableness or unfavorableness in attitude toward some designated 
object. An example of such a scale is given in Table +. On this scale, a 


Tase 4. Statements in Form A of the Peterson-Thurstone Scale for the Measurement 
of Attitude toward War* 


Under some conditions, war is necessary to maintain justice. 

The benefits of war rarely pay for its losses even for the victor. 

War brings out the best qualities in men. 

There is no conceivable justification for war. 

War has some benefits; but it’s a big price to pay for them. 

War is often the only means of preserving national honor. 

. War is a ghastly mess. 

I never think about war and it doesn’t interest me. 

War is a futile struggle resulting in self-destruction. 

The desirable results of war have not received the attention they deserve. 
11. Pacifists have the right attitude, but some pacifists go too far. 

12. The evils of war are greater than any possible benefits. 

13. Although war is terrible it has some value. 

14. International disputes should be settled without war, 

15. War is glorious, 

16. Defensive war is justified but other wars are not. 

17. War breeds disrespect for human life. 

18, There can be no progress without war. 

19. It is good judgment to sacrifice certain rights in order to prevent war, 
20. War is the only way to right tremendous wrongs. 


ONAMPYNS 


Dad 


10. 


* From Peterson, R. C. 4 Scale Jor the Measurement 
University of Chicago Press, 1930, 


of Attitude toward War, Chicago: 
person’s attitude toward war is indicated by a summation of nu- 
merical values associated with his answers to the statements with 
which he indicates some degree of agreement. Now how general or 
how specific is his attitude toward war? Let us assume it to be 


properly represented by position C in Fig. 1. Let us now look for 


attitudes toward war that are more specific or more general than 
that indicated by position C. If there are attitudes toward war more 
specific than those represented by position C, they will exhibit a 
lower degree of intercorrelation among themselves than will those 
scaled at position C. We find this lower degree of intercorrelation 
and greater specificity in our subject’s answer to any one of the 
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20 items in the war attitude scale. Each of these items taps a more 
specific attitude toward war than any measure based upon several of 
them jointly considered. Let us indicate the degree of specificity- 
generality involved at point B on our continuum. 

It should already be obvious that we cannot peg “attitude toward 
war” at some one point on a specificity-generality continuum. If our 
primary interest is in an item response, we shall surely find that 
“attitude toward war” is a more specific personality trait, i.e., the 
various measures thereof will exhibit a lower degree of intercorrela- 
tion, than if we interest ourselves in any summary score based upon 
several of the item responses. 


0 -|- A Attitude or behavior in a specific situation 
1 -|- B Response to one item in an attitude scale 
2 -|- C Score based on several items in an attitude 
scale 
3 -|- D Part of a constellation of several scores | 
4 -|- E Part of a constellation based on several 


variables at level 3 
Fic. 1. The specificity-generality continuum for attitude toward war. 


Can we find greater specificity than that implied in an item re- 
sponse? Most assuredly! To give an answer to any one of the items 
m the war attitude scale, our subject must think of specific instances 
in his own past behavior that would lead him to agree with the 
Statement and of other instances that would lead him to disagree. 
As soon as he has thought of several such situations, both pro and 
con (point A on our continuum), he may find that these separate 
experiences are not in complete accord. And, in general, these sepa- 
rate experiences will exhibit a degree of intercorrelation even lower 
than that among the item responses we have already discussed. This 
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° 
forces our subject into a certain degree of generality which he must 
abstract from his specific experiences in order to respond, as we have 
asked him to respond, to the items (point B) in our war attitude 
scale. g 

Let us return to point C in Fig. 1 and see if we can find greater 
generality. If there are attitudes toward war more general than those 
represented by position C, they will exhibit higher intercorrelations 
among themselves than will those scaled at position C. Attitude 
toward war, as measured by the total score on an attitude scale, has 
been found to correlate with attitudes toward capital punishment 
and the treatment of criminals (see Chap. 4). The cluster formed 
by these three attitude variables can be considered under the more 
general heading of humanitarianism, and this can be represented 
at point D on our specificity-generality continuum. 

If we can group together under a generalized frame of reference, 
such as radicalism-conservatism, several of the attitude clusters 
characteristic of point D on our continuum, we can arrive at point 
E. Here we find still higher intercorrelations among our separate 
measures and therefore even greater generality. An example would 
be a person who secures conservative scores on the three attitude 
variables, religionism, humanitarianism 
in Chap. 4. These vari 
individual is apt to b 


» and nationalism, described 
ables form a meaningful cluster because an 
e forced into adopting a conservative or a 
radical position with respect to each one of them. 
Let us now summarize the v 
have distinguished and arrang 
Level 0. This is a behavior. 


arious specificity-generality levels we 
e them in order as follows: 

al response, overt or covert, made in 
a specific situation. It is unique to the time, place, and individual 
who made it. It is, for this very reason, usually devoid of scientific 


value or at least is ordinarily not tapped by the psychological meas- 
uring techniques in current use. 


Level 1. This is the res 
as that included ina Thu 


ing tecl 3 and unique (although 
it is neither) a response as is sought or desired. 
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Level 2. Any summary of the responses to several of the items in a 
psychological test, such as a summary of the responses to the items 
in a Thurstone attitude scale, characterizes this level of specificity- 
generality. The response is an abstraction and is twice removed from 
the uniqueness and specificity of a single personal event such as that 
characterizing Level 0. It will be consistent with psychological 
terminology if we label our result at this level a /rait score. It will be 
seen, therefore, that traits cannot be specific. We have eliminated 
this possibility by definition. 

Level 3. The response derived from a grouping or constellation 
of several traits, such as those which define the attitude variables 
religionism, humanitarianism, and nationalism, characterizes this 
level of our specificity-generality continuum. 

Level 4. Popular and even some scientific discussion (see Stagner) 
still continues to group under general titles such as radicalism- 
conservatism or introversion-extroversion, various syndromes or 
clusters defined as characterizing Level 3. Thus a person who as- 
sumes a conservative attitude in one area (say, religion) may also 
assume a conservative attitude in a second albeit independent area 
(say, humanitarianism). 

We may profitably conclude our discussion on specificity-gen- 
erality by pointing out that as soon as the psychologist becomes 
Interested in any phenomenon he, like any other scientist, begins to 
abstract that which is common to a number of observed situations. 
‘This forces him to consider generalities and not unique occurrences. 
It is up to the psychologist to determine, however, as to how far 
toward the generality end of the continuum he wishes to proceed. 
He can work effectively at any of the levels we have described. 


DETERMINANTS OF PERSONALITY STRUCTURE 


In order that we may accomplish the objectives set forth in a 
Preceding section, we must discover the relations obtaining between 
what we can call predictors and what we can call predictands. From 
our study of elementary chemistry, we should all be familiar with 
Boyle’s law. If a chemist knows the pressure exerted on a gas and 
holds temperature constant, he can predict its volume. The known 
Pressure is our predictor and the unknown volume is our predictand. 
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The predictor is what we have to start with or what we know. The 
predictand is what we do not have immediately given or what we 
do not know. Something about it is to be predicted from something 
else, our predictor. 

It is our hope in the study of personality that our various methods 
of measurement will fall primarily into the class of predictors. From 
the knowledge or facts we gain from a personality test we hope to be 
able to make some prediction about behavior: either future behavior 
or at least behavior that is not immediately available for present 
observation. It is not necessary, however, that we attempt to use our 
measures of personality as predictors only. We may wish to consider 
other items as our predictors and see if from them we can estimate 
an individual’s responses on a personality test. It is important that 
we understand this dual usage, for it is possible that in some situa- 
tions we may decide that a particular method of personality meas- 
urement is satisfactory as a predictor but not as a predictand, or 
vice versa. It is our hope, however, that ultimately, methods of 
personality measurement can be made equally satisfactory for both 
purposes. 

We will now find it helpful to discuss several studies which illus- 
trate the utility of four predictors of personality. In addition these 
studies will show that a knowledge of personality is of material 
importance in helping us attain our three afore-mentioned objec- 
tives: the description, the prediction, and the control of individual 


behavior, group behavior, and their mutually interdependent 
relations. 


Constitutional Determinants. In 
Some Aspects of Maternal Behavior” 
that maternal feelings are related to duration of menstrual flow, and 
possibly to diameter of areola, but not toa number of other variables 
such as body weight, height, shoulder or hip width, and gross size 
of breast. Levy’s data show that we can distinguish between women 
who are apt to develop a high degree of maternal feeling and women 
not apt to do so, and we can do this by reference, primarily, to 


duration of menstrual flow and, secondarily, by reference to areolar 
diameter. 


“Psychosomatic Studies of 
Levy presents data to show 


To establish these facts Levy interviewed 72 mothers and classi- 
fied them on degree of maternal feeling. This did not prove to be an 
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easy task, but Levy states that all cases rated as “high maternal” 
and all cases rated as “low maternal” are reasonably clear-cut. An 
example of a case included in each of these categories, as well as one 
included in a midgroup, is given below: 


High Maternal. Mother of four children. As a young child her favorite game was 
taking care of dolls, dressing them, putting them to bed. She played with dolls until 
age 14 or 15. She used to make visits among her mother’s friends to take care of 
their babies. When she thought of being a mother, she hoped to have six children, 
and have them as soon as possible. When she saw a pretty baby on the street, she 
had a strong urge to take it in her arms and hug it. She was a “baby-carriage 
peeker” before, and after, marriage. In her relations with men she was always 
maternal; much more, she said, than they liked. Having had four children she is 
now pregnant with her fifth. She had a nurse for her first child and was miserable, 
she said, because she couldn’t take full care of it. She hated the hospital rule of not 
having the baby in her room. She had a copious supply of milk. Her husband stated 
that she really spoiled the children; that every so often she fought against this 
tendency and became severe, to protect them from her spoiling. But the children 
“see through it.” 

Midgroup. Mother of one child. She played with dolls probably to age five or six. 
She was not especially interested in maternal play in childhood. For a period of 
two years, age twelve and thirteen, she used to look after some neighbor’s children 
because she liked to. She never anticipated the number of her children or had fantasies 
of being a mother. When she saw a pretty baby on the street, she was interested 
but had no urge to hold it, or have one of her own. After marriage, however, she 
became pregnant very soon and willingly. She voluntarily took sole care of her child 
and is evidently an affectionate mother. Her pregnancy was very difficult and she 
was warned by her physician against further impregnation. 

Low Maternal. Mother of two children. She never played any maternal games in 
childhood, nor played a maternal role to another child. She had very little interest 
in dolls and stopped playing with them when about age six. When she saw a pretty 
baby on the street she was not at all interested. As an adolescent, she never in- 
dulged in the phantasy of being a mother and having children. She was ambitious 
to get married but never thought about having children. As a mother she felt quite 
incompetent. She took her children off the breast after two weeks, because she 
didn’t like it; she felt like a cow, she said. She still hates the physical care of children, 
though she is a dutiful mother and rather affectionate. She never was maternal 
toward men. Her interests have always been feminine, and she has been quite 


Popular with men. 


Levy found no correlation, either in his main group of 72 mothers 
or in subsequent groups of women, between ratings of maternal 
behavior and size of nipple, height, weight, hip width, and age at 
first mensis. He did find a striking correlation of .58, however, 
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between maternal behavior and length of menstrual flow. Women 
whose menstrual flow lasts six or more days tend, in general, to be 
high in maternal feeling, while women whose menstrual flow lasts 
four days or less tend to be low in maternal feeling. 

Correlations with areolar diameter, though positive, are not so 
high. Those reported are .23 and .33. Being based upon approxi- 
mately 200 cases, however, they can be considered significant. 

While it is known that duration of menstrual flow is not deter- 
mined entirely by constitutional factors, it would be dificult to 
conclude that areolar diameter had anything but a constitutional, 
at least a somatic, basis. Therefore we may consider Levy’s study as 
illustrating that a personality trait can have in part at least a con- 
stitutional or somatic basis. 

Group Determinants. Our next example shows how personality 
traits may have their origin in social custom, habit, or attitude. 
Davis and Havighurst have presented data showing that both social 
class and color are closely related to certain practices of child rear- 
ing. In their study, Davis and Havighurst interviewed 200 mothers 
representing two class and two color groups: middle and lower class, 
and white and Negro. There were 50 mothers in each group, and 
all were resident in Chicago. Davis and Havighurst’s chief conclu- 
sions are summarized by Kluckhohn and Murray as follows: 


1. There are significant differences in child-rearing practices between the middle 
and lower social classes in a large city. The same type of differences exist between 
middle and lower-class Negroes as between middle and lower-class whites. 

* 2, Middle-class parents are more rigorous than lower-class parents in their train- 
ing of children for feeding and cleanliness habits. They also expect their children to 
take responsibility for themselves earlier than lower-class parents do. Middle-class 


parents place their children under a stricter regimen, with more frustration of their 
impulses, than do lower-class parents. 


3. In addition to these social-class differences, th 
Negroes and whites in their child-re 
than whites in the feeding and wean 
rigorous than whites in toilet traini 


ere are some differences between 
aring practices. Negroes are more permissive 
ing of their children, but they are much more 
ng. 


We learn, then, that there are differences in child training due to 
color and social class. Knowing the color and social class and treating 
these as our predictors, we can make certain inferences about the 
type of child training (our predictand). Let us hasten to add with 
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regard to the color differences that we have in no way proved that 
there are racial differences. Race is a biological concept, and it is 
pretty generally agreed among scientists that there are no entirely 
satisfactory and clear-cut biological standards by which membership 
in a race can be determined. The differences under consideration 
have not been proved to have biological origins and therefore cannot 
be said to have racial significance. 

Role Determinants. Under ordinary circumstances a girl expects 
to model her behavior, as have other girls before her, upon that of 
adult women, anda boy expects to model his behavior, as have 
other boys before him, upon that of adult men. Sometimes, however, 
the maturing boy or girl will reject all attempts to achieve what 
would appear to be his or her proper adult sex role, and will attempt 
to achieve status by assuming some of the characteristics, habits, 
and attitudes of the opposite sex. What can cause a boy to reject the 
masculine role and to assume a feminine role? And what can cause 
a girl to reject the feminine role and to assume a masculine role? 

In a study conducted by the author it was found that harsh or 
irrational .childhood discipline, rejection by parents, and other 
parental behavior not condoned as leading to good mental hygiene 
apparently can cause some boys to reject the masculine role and to 
assume the feminine role, and can cause some girls to reject the 
feminine role and to assume the masculine role. Kind, rational, but 
firm childhood discipline and the sort of discipline condoned as 
leading to good mental hygiene are apparently conducive to boys 
accepting their intended masculine sex role and to girls accepting 
their intended feminine adult sex role. Thus, “proper” childhood 
discipline can lead a child to accept his or her normal adult sex role, 
and “improper” childhood discipline can lead a child to reject his 
or her normal adult sex role. The assumption of the opposing sex 
role can be viewed as an attempt upon the part of the person con- 
cerned to achieve a status which apparently is otherwise going to be 
denied. 

Situational Determinants. As our last example of the relation 
between a personality predictor and a personality predictand we 
may cite a study conducted by J. McV. Hunt when he was on the ` 
staff of St. Elizabeths Hospital in Washington, DiG Through one 
of the patients at the hospital, Hunt was able to trace the histories 
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of a group of childhood playmates who, in one setting, were placed 
in a situation that required them (to achieve status) to participate 
in a number of sexual perversions and, at the same time, to partici- 
pate in a series of intensely religious revival meetings. Participation 
in the revival meetings gave the boys a set of values completely 
incompatible with their sexual indulgences and produced in them a 
serious conflict situation. It is our purpose, as it was Hunt’s, to note 
the effect of this conflict upon resultant personality development. 

In collecting his material, Hunt secured data on 15 boys. He 
secured data as to which of these boys participated in neighborhood 
athletics, in the perversions, in the revivals, and in the use of al- 
coholic beverages. Hunt found that all except two of seven boys who 
participated in doth the sex perversions and in the revival meetings 
were committed to St. Elizabeths Hospital. Four boys who par- 
ticipated in the perversions but not seriously in the revivals were 
not committed to the hospital, nor were four other boys, three of 
whom participated in neither the perversions nor the revivals and 
one of whom participated only in the revivals. 

Hunt found that the ages of commitment to St. Elizabeths ex- 
tended over a fairly wide range. This led him to postulate a con- 
tinuum of frustration tolerance; that is, to some boys their conflict 
became a real and earnest threat early in life and caused early mental 
breakdown and commitment. To others, the conflict did not become 
serious, or if serious, did not become a threat until after the lapse 
of a number of years. Finally it came, but only when their higher 
resistance thresholds had been reached. 

We have now examined four predictors or determinants of per- 
sonality: constitutional, social, role, and situational. These consti- 
tute only four examples, however, and cannot in any sense be con- 
sidered as exhaustive of the type of predictor or determinant needing 
investigation. Nevertheless, the student should have gained from 
these examples some insight into the importance of personality in 
the description, understanding, prediction, and control of human 


behavior. 
GUIDING PRINCIPLES FOR RESEARCH AND STUDY 


The guiding principles for research and study in the field of per- 
sonality measurement are no different from those which obtain for 


Fundamental Concepts 25 


research and study in any other field of psychological investigation. 
These are basically three in number: a firm adherence to the prin- 
ciples of experimental psychology, a firm adherence to the principles 
of sound statistical analysis, and an adequate emphasis upon the 
theoretical framework within which any set of data is to find its 
meaning. Let us briefly consider each of these principles. 

Adherence to the Principles of Experimental Psychology. Too 
many investigators in the field of personality measurement contend 
or imply by their actions that the principles of sound experimental 
psychology need not be applied in the field of personality measure- 
ment. This is, indeed, a most unfortunate circumstance. Nothing 
can delay progress quite so much as failure to abide by certain basic, 
elementary, and simple rules of experimental procedure. 

One of the most basic and fundamental rules in experimental 
psychology, as indeed it is in all science, is that we control all vari- 
ables known or thought to have significance for the problem at hand. 
We can then permit alterations in one of the variables (the one we 
select for our predictor) and see what changes it causes in one or 
more of the remaining variables (our predictands). 

As a chemist or physicist interested in determining anew the 
relations expressed in Boyle’s law we must control pressure, volume, 
and temperature. We hold temperature constant, allow pressure to 
vary, and see what changes this causes in volume. If we are inter- 
ested in verifying Charles’s law: that pressure varies with tempera- 
ture, we hold pressure constant, allow temperature to vary, and see 
what changes this produces in volume. If in attempting to verify 
Boyle’s law, we neglect to hold temperature constant; or if in at- 
tempting to verify Charles’s law, we neglect to hold pressure con- 
stant, we would never, as physicists or chemists, be able to dis- 
cover the fundamental relations involved in the famous formula 
PV = kMT. 

In the fields of experimental psychology and personality measure- 
ment it is much more difficult to control the significant variables 
than it is in traditional physics. This does not mean, however, that 
we can disregard such control. As difficult as it may be, we must in 
some manner exert this control or forever flounder in superstition 
and error. 

If this admonition be interpreted to mean that the psychologist 
interested in the field of personality measurement must ape in 
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complete detail the procedures of thre brass-instrument emer ie 
the reader has seriously missed the author's point. Experimenta 
control does not need to imply brass-instrument or laboratory 
psychology. It implies nothing more than a careful consideration 
and control of all factors which may affect an investigator's conclu- 
sions. To illustrate we may point out that the psychologist interested 
in determining sex differences on some personality test should assure 
himself: 


1. That the items on the test adequately tap the personality functions to be 
studied 


2. That the men and women tested constitute representative or random samples 
of all men and women 


3. That sufficient cases are studied to rule out chance variation as a principal 
component in the explanation of results 


4. That factors such as age are held fixed or constant or are taken appropriately 
into account so as not to obscure the significance of the results 

5. That the men and women tested have had equal opportunity to learn the 
personality habits or characteristics under study and have been equally motivated 
to reveal them in the testing situation designed by the investigator 


Adequate attention to points such as these constitutes the type of 
control possible and necessary in the field of personality measure- 
ment. Many investigators secure and publish data with too little 
thought having been given beforehand to the control of factors such 
as we have just discussed. Once in a while 
lucky and will secure significant d 
thought was given to such control, 
frequent occurrence, however, and 
control is not to be recommended. 

Adherence to the Principles of Sound 
basic principle to be followed in this conn 
soundness and adequacy of the original e 


collected through sloppy experimental procedures cannot be saved 
by any legerdemain of statistical analysis. Valid experimental data 
can be invalidated, however, by inadequate or unsound statistical 
methods. 


Having once decided u 
well to consider the alte 


an investigator may be 
ata even though no conscious 
This represents a rare and in- 
the practice of ignoring such 


Statistical Analysis. The 
ection is to be sure of the 
xperimental design. Data 
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differ considerably in merit in terms of the type of statistical analysis 
that can be used in connection therewith. We can illustrate this by 
an example in the field of learning. Suppose it is the purpose of an 
experimenter to determine which method of classroom instruction, 
A or B, is the better. He can proceed experimentally in two ways: he 
can equate two groups of subjects and can teach one group by 
method A and the other group by method B, or he can equate two 
learning skills and can teach both skills to the same group of sub- 
jects, one by method A and the other by method B. In the first 
procedure the basic formula to determine the significance of the 
difference in the results produced by the two methods is 


difference 


Noa? + on” 


CR = 


This gives the standard error of a difference between two uncorre- 
lated samples or, in this instance, between two different subject 
populations. If our investigator chose to follow the second of our 
two experimental procedures he would find his major basic formula 
to be 

difference 


CR = 
Vou? + o 


This formula differs from the first by taking into account any cor- 
relation that may exist between the two samples to be compared. 
The formula shows that the higher this correlation the less the value 
of the standard error. Thus a given difference may be found to 
possess differing degrees of significance depending upon the com- 
position of the basic experimental groups. If the two groups to be 
compared consist of the selfsame subjects, there. is obviously a 
correlational factor to be considered. And when this correlational 
factor is considered, it leads to greater precision in defining any 
difference that may be obtained. 

Emphasis upon Theoretical Framework. Some investigators feel 
that the collection or discovery of a new fact, if it can be established 
as such, is a sufficient goal for research. Other investigators feel that 
facts as such are not worth the time, money, and effort needed to 
establish them unless they can be integrated into some theoretical 
or conceptual scheme. Perhaps we can reach a compromise between 
these two points of view and allow that each side of the argument has 
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merit. It would seem, however, that future advancement in the 
field of personality measurement will require much theory formula- 
tion. This theory formulation can then be followed by the gathering 
of facts to verify or disprove the theories that may be involved. 
Should isolated facts be discovered they should, of course, be re- 
corded, but it would seem that most research workers would find 
their work more interesting and, indeed, more profitable if they first 
developed theoretical guideposts to steer them along their various 
ways. It is frequently said that a theory is no good unless there are 
facts to support it. This may well be true, but let us also heed the 
warning given by Sir Arthur Eddington: that facts are no good 
either, unless they are supported by an adequate theory. 


2 


INTERESTS: AN EMPIRICAL APPROACH 


In this and in the following chapter, we shall be concerned with 
techniques for the measurement of interests. In this chapter we 
propose to discuss an empirical approach to the problem, and in the 
next chapter, a rational approach. The latter approach was used by 
Dr. G. Frederic Kuder in the development of his Preference Records; 
while the empirical approach, the subject of our present chapter, was 
used by Dr. Edward K. Strong, Jr., in the development of his 
Vocational Interest Tests. 

The chief purpose of the Strong Vocational Interest Tests is to 
show a person the extent to which his interests correspond to those 
of successful men or women in a variety of occupations. In addition, 
they can show the extent to which a person’s interests correspond 
to those characteristic of men in contrast with those characteristic 
of women (masculinity-femininity). The men’s blank can also show 
the extent to which a person’s interests correspond to those of 25- 
year-old men in contrast with those of 15-year-old boys (interest 
maturity), and the extent to which they correspond to those of 
professional and businessmen in contrast with those of unskilled 
workmen (occupational level). 

Each of the two tests consists of a total of 400 items. In terms of 
their content, those in the blank for men can be classified as in Table 
5. In most of these parts the subject is asked to draw a circle around 
one of three letters: L, I, or D, to indicate whether he likes, is indif- 
ferent to, or dislikes the occupation, school subject, amusement, 
activity, or kind of person in question. In Part VI, however, in 
indicating preference among activities, the subject is asked (in each 
of 4 groups of 10 activities) to check the 3 he likes best, the 3 he 
likes least, and to check the 4 remaining as neutral. In Part VII, the 
section requiring a comparison between activity items, ¢.g., between 
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streetcar conductor and streetcar motorman, the subject must 
indicate whether he prefers one of these two activities in contrast to 
the other or whether he likes them equally well. And, finally, in 


Tase 5. Type of Item in the Strong Vocational Interest Test for Men* 


Item Number of items 

ORCOPBU ORG oe d a vie wt as naonin 100 
SCHOGL SACOG ic ws es sis ie ose cae mmorsarecaersns 36 
Amusements. . ae 49 
AVIES cx ave av sa HH ie es toon 288 
Kinds of people........... 47 
Order of preference for activities.......... 40 
Comparison between two items............ 40 
Present abilities 


indicating his present abilities (Part VIII) the subject is asked to 
indicate whether each one of 40 statements about abilities charac- 
terizes him, does not characterize him, or whether he cannot decide. 


PURPOSES OF TEST 


The scores on the Strong Vocational Inte 
variety of uses. Four of the most import 
vocational guidance, vocational selecti 

Educational Guidance. 
undecided about what cour 
scores on the Strong Vocati 
help. If his scores indicate t 

_ ful lawyers, 


rest Test can be put toa 
ant are educational guidance, 
on, and research. 


n to take one or two 
he likes them. If the 
ction, he can consider 


Test as confirmatory 
ndeavor, 
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Vocational Guidance. We can echo here what we have just said 
in our discussion on educational guidance. It is sometimes, and not 
‘infrequently, the case that a student will near the end of his collegiate 
career without being entirely certain as to which specific vocation 
he wishes to enter. Reference to the scores on the Vocational In- 
terest Test, along with other considerations, may help him reach a 
decision. Here, as in the case of educational guidance, two courses of 
action are possible. The student may choose to follow the indications 
of the test, or he may choose to ignore them. It would seem proper, 
however, for a student seriously to consider entering any occupation 
in which he receives a high score; and he should seriously consider 
uot entering any occupation in which he receives a low score. If he 
decides not to enter an occupation in which he receives a high score 
or Zo enter an occupation in which he receives a low score, he should 
have good and cogent reasons therefor. 

What the student gets from the Strong Vocational Interest Test is 
an indication of whether or not his own interests, his own likes and 
dislikes, his own preferences and aversions correspond to or do not 
correspond to those of successful men or women in the occupations 
designated. For example, it tells the student whether his interests 
are similar to, or dissimilar to, those of successful lawyers; similar 
to, or dissimilar to, those of successful psychologists; similar to, or 
dissimilar to, those of successful engineers; similar to, or dissimilar 
to, those of successful bankers; and so on. Strong’s basic theory, well 
substantiated by the empirical facts which he has assiduously 
collected for more than twenty-five years, is that when other factors 
such as ability are equal, a person will be much happier and pre- 
sumably more successful in an occupation in which he finds a large 
number of men with interests similar to his own. This does not mean, 
of course, that a person cannot be successful in an occupation if his 
Interests are dissimilar to those of men already engaged in it. But 
it does seem logical to expect that he will be less happy in such an 
Occupation than in one having men in it with interests corresponding 
to his own. 

It should be clearly understood that the Strong Vocational Inter- 
est Test gives no indication of ability. This must be discovered 
through intelligence or aptitude tests, or in other ways, if such tests 
are not available. If the Strong Vocational Interest Test indicates 
that a student has interests like those of successful engineers, but a 
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mathematical aptitude test shows lack of facility in mathematics, 
the student should probably steer clear of the field of engineering. 

However, if a student has the necessary aptitude and the requisite 
degree of intelligence to enter, let us say, either engineering or ta 
and the Strong Vocational Interest Test yields a high interest score 
in engineering but a low score in law, the student should carefully 
consider going into engineering before deciding definitely against 
it, and should carefully consider zot going into law if, prior to the 
test, he had some intention of entering this profession. f 

A point that must be clearly understood in interpreting a score 
on the Strong Vocational Interest Test is that it does not indicate 
interest #7 an occupation. This may be a technicality, but a high 
score on the law scale does not indicate an interest in law. It merely 
shows that the subject’s interests, whatever they may be, correspond 
to those possessed by the majority of successful lawyers. Fortun- 
ately there seems to be a positive correlation between having in- 
terests like those of successful practitioners in an occupation and 
having an interest in the activities involved in the occupation. 
Nevertheless, the distinction is an important one and should not be 
glossed over in interpreting a score. 

Vocational Selection. Strong developed his Vocational Interest 
Test primarily as a means of helping college students decide upon 
appropriate courses of study and suitable vocations. If the test does 
this, and we know that it does, it would also seem reasonable that 
it should be of assistance to employers seeking applicants in those 
occupations for which the test can be scored. If the test helps a 
college student to decide that chemistry is the line of work he should 
follow, it would seem that an employer of chemists could profitably 
use the test as an aid in selecting applicants for empl 

While this sounds reasonable, and in fact the test can be used in 
this way, it is necessary for us to be on guard with respect to certain 
possible flaws in the logic involved. When an employer selects an 
applicant for employment, he is interested in selecting a person that 
he thinks is going to be successful. To do this he needs some method 
of distinguishing applicants who will later become successful from 
applicants who will later become unsuccessful. This is a different 
problem, and may have a different answer, from that involved in 
deciding whether a person’s interests are like chemists as a group 
in contrast to those of men engaged in other occupational pursuits. 


oyment. 
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Strong has discussed this problem in detail and points out that the 
differences between a successful and an unsuccessful chemist may 
not, and need not, in fact, correspond to the differences between all 
chemists (successful and unsuccessful) and other occupational groups 
(including successes and failures). In the field of life insurance there 
is an overlap, however, between the kinds of things that differen- 
tiate most life insurance salesmen from men-in-general, and suc- 
cessful life insurance salesmen from unsuccessful life insurance 
salesmen. This fact has made the Vocational Interest Test an 
exceedingly useful instrument as an aid in the selection of life insur- 
ance salesmen. 

When the Vocational Interest Test is used for educational or 
vocational guidance, the norms which Strong has supplied should be 
used. When the test is used in vocational selection, however, new 
validation data must be obtained, and new norms must be prepared. 
These are necessary to show whether or not the test will be useful 
for the purpose intended. It is quite possible, and it has been re- 
peatedly demonstrated, that a test can be valid for the selection of 
employees in one company but invalid for selection in another com- 
pany. In this respect, the Strong Vocational Interest Test is no 
different from any other. It must be validated anew for each situa- 
tion in which it is intended to be used. The test possesses an advan- 
tage, however, that no other selection test possesses. It not only 
can be made to yield a new scale for the specific occupation in 
question but also yields information on the previously developed 
occupational scales as well. Frequently a pattern of interests tells 
much more about an applicant than does the score on any particular 
scale, even though this scale is the one most pertinent to the position 
in question. 

Research. The Strong Vocational Interest Test has proved a 
most useful instrument in research on the structure of that segment 
of personality covered by the concept interest. Strong himself has 
done a monumental job for more than twenty-five years, but many 
other investigators have made good use of his test also. In most of 
the remaining sections of this chapter we shall have occasion to 
report or comment upon elements of personality structure which 
would not now be known or which would be known less well, were 
it not for the research made possible by the Strong Vocational 


Interest Test. To give a few examples: 
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1. We now know a good deal about how interests change with age and what sig- 
nificance these changes have in vocational guidance. , 

2. We now know a good deal about how the interests of various occupational 
groups compare with each other. This provides us with added insight useful in 
vocational guidance. 


3. We now know that higher and lower occupational groups differ in terms of 
their interests. Thus, through interests, we can tell something not only about the 


direction in which a person can appropriately bend his efforts but also something 
about the job level he should strive to attain. 


4. We now know something about the extent to which certain interests are 
related to certain personality traits, to intelligence, to various aptitudes, and so 
forth, and we find that interests constitute a definite and measurable segment of 
personality, not adequately covered by other types of personality tests. 


DEVELOPMENT OF OCCUPATIONAL SCALES 


We have referred to Dr. Strong’s approach as an empirical one. 
By this we mean that every method of scoring which Dr. Strong 
has devised has been based directly upon demonstrated differences 
among contrasting criterion groups. Strong and other members of 
the now famous Carnegie group (including, among others, Miner, 
Ream, Freyd, and Moore) started research which later led to the 
Vocational Interest Test, with one basic theory: that occupational 
groups can, in terms of their interests, in terms of their likes and 
dislikes, and in terms of their preferences and aversions, be dis- 
tinguished from one another; that is, that the members of one 
occupational group (say, chemists) will have a different set of likes 
and dislikes from those of the. members o 
group (say, lawyers). To check this theory 
the interests of various professional groups, not directly with each 
other, but with what he calls men-in-general. For example, he has 
compared the interests of lawyers with those of men-in-general, and 
he has compared the interests of chemists with those of men-in- 


general. He finds not only that lawyers and chemists differ from 


men-in-general but also that they differ from men-in-general in 
different ways. Therefore, 


they are different from each other. 
The procedures which Strong has used in the development of each 
of his occupational scales are as follows: 
1. Representatives of an occupational group complete the blanks. 


2. The number of men who answer L, I, and D to each item is determined. 
3. These numbers are translated into percentages. 


f another occupational 
» Strong has compared 
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Tare 6. Determination of the Scoring Weights for the First 10 Items of the 
Engineering Interest Scale* 


Percentage of Percentage of Difference in s 
men-in- ins tel aay | Scoring weights 
Rist 10 irene paa engineers | percentages 
Rt pI L/L oO] BPEL jajji 
— = aS ene | " 
Actor (not movie 132] 47| 931| @|-12|-1| 13)-1) of 1 
38 | 29 | 14 | 37 | 49} —19] — 1 20 | —2 0 2 
| 40 | 23| 58] 32) 10) 21) —8)-13) 2) —1]| —1 
29 | 49 | 31 | 33 | 36 2 4) =—13 1 0; -1 
40 | 36 | 28 | 39 | 33 Si) e 3 0 0 0 
44 | 30 | 38 | 44 | 18 12 0| —12 1 0| -1 
Athletic director... . 41 | 33 | 15 | 51 | 34) =H 10 1.) =1 1 0 
Auctioneer, o.. s e ries) DITEN] = 7) =u 18 | —1 | —1 2 
Author of novel. +: 32 | 38 | 30 | 22 | 44] 34] —10 6 4|-1 1 0 
Author of technical 
book... 31 | 41 | 28} 59] 32) 9 28 | —- 9 | -19 3 | =i | =Z 


* From Strong, E. K., Jr. Vocational Interests of Men and Women, Stanford University, 
Calif.: Stanford University Press, 1943. 


Tani 7. Scores Obtained by One Subject on Six Scales of the Vocational Interest Test* 


i Re- | Engi- | | Life in- | agin, | YMCA Account- 
tem Lawyer | surance | . secre- 
sponse neer | g a his ister ant 
salesman tary 
Actor (not movie).....- D 1 -1 -1 -2 -1 -1 
Advertiser....... D 2 Í —1 0 —2 -1 
Architect... D -1 1 1 0 0 0 
Army officer. I 0 0 0 -1 0 0 
L N 1 0 0 -1 1 J 0 
Astronomer L 1 0 0 2 0 0 
Athletic director. 1 1 0 0 1 0 0 
Auctioneer I -1 -1 0 0 0 0 
Author of novel........ l 1 0 0 0 0 0 
Author of technical book} L 3 0 —1 -1 -i 1 
Total 10 itêms. eof e T 0 -3 0 —3 —1 
Total 400 items... ofe 182 23 —115 —91 —134 —33 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, 


Calif.: Stanford University Press, 1943. 
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Taste 8. Norms for the Architect Scale* 


Architect Percentile ranks 
241 306 285 
Baw Standard Rating = Stanford | Stanford 
score score architects Pedic || sein 
| 
220 69 A 99 
210 67 A 98 
200 65 A 97 
190 63 A 92 
180 6l A | 87 | 
170 59 A 80 | 
160 57 A 75 99 
150 55 A 65 99 99 
140 53 A 60 99 99 
130 51 | À 51 98 | 98 
120 49 A 41 98 98 
110 47 A 34 97 | 8% 
100 45 A 30 9 | 93 
9 | 43 | B4 25 93 | 91 
80 41 B+ | 2% 90 87 
70 39 B+ 15 8 | 8 
60 37 B | g 83 | 80 
50 35 B 7 77 76 
40 33 B- 6 73 70 
30 31 B= 5 67 65 
20 29 B= 3 0o | 
10 27 C+ 2 | 55 | 5 
0 25 C+ 1 | 50 46 
—10 23 C | 1 | 44 40 
—20 21 cj | 1 36 35 
—30 19 C | 1 31 29 
—40 17 e | | 2 a 
—50 15 € | | 20 18 
—60 13 g | 15 13 
—70 11 Ss 13 9 
—80 9 res 10 7 
—90 7 È | 7 4 
—100 | 5 re 5 5 
—110 3 c A i 
—120 į ic 3 i 
—130 —1 C | 2 i 
—140 -3 el | 1 
—150 —5 g 1 
—160 | =i È 1 | 
* From Strong, 


E. K., Jr. Vocational Interests 


; of Men and W, E. : tog 
Calif.: Stanford University Press, 1943, of omen. Stanford University, 
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4. These percentages are compared with the corresponding percentages for 
men-in-general. 

5. By means of an appropriate formula or chart, item weights reflecting the 
magnitude of the differences between the percentages for the occupational group 
in question and the men-in-general group are assigned. 


Table 6 gives the results of the preceding steps for the engineering 
scale for the first 10 items of the test. It shows the percentage dis- 
tribution of answers for engineers and for men-in-general, the dif- 
ferences between these percentages, and the scoring weights assigned. 
Having determined the scoring weights, the score for an individual 
is obtained by summing (algebraically) the scoring values associated 
with the responses he checks. An illustration of the scoring process 
and of the results is given in Table 7. 

The Strong Vocational Interest Test for Men can be scored for 42 
occupations, and that for women can be scored for 24. For each of 
these occupations, Strong provides raw scores, standard scores, letter 
grades, and percentile norms. An example is given in Table 8. The 
means and standard deviations of the raw-score distributions differ, 
of course, from one scale to another. The means and standard 
deviations of the standard-score distributions are identical, however, 
for all scales. These are 50 and 10, respectively. Letter grades, also, 
are the same for all scales, being assigned, as they are, upon the 
basis of standard scores in accord with the schedule in Table 9. 


Taste 9. Schedule of Letter Grades and Standard-score Equivalents* 


Pe: ma 
Letter grade | Standard score Percentage of 
cases 
| 

A 45 and up 70.2 
B+ 40-44 11.9 

B 35-39 9.6 

B= 30-34 4.8 

C+ 25-29 25 

Cc 24 and below 1.0 
Silas eoasieeser 214 | 100.0 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, 
Calif.: Stanford University Press, 1943. 


The data in Table 9 show that 70 per cent of the members of each 
e a grade of A on the scale for their own 


occupational group secur 
B, B—, and 


occupation, that 29 per cent receive grades of B+, 
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C+, and that only 1 per cent receives a grade of C, sence: 
when an individual secures a grade of A, that is, a standard score g 
45 or higher it means that his interests correspond to those of 
least 70 per cent of the members of that professional gou y e 
gets a score of B, his interests correspond, at best, to no more thie 
those of 29 or 30 per cent of the members of the professional group 
in question. And, finally, if he gets a score of C the chances are pretty 
good that his own interests do not correspond to more than 1 per cent 
of those of the members of that professional group. For all practical 
purposes, they just do not correspond. 


CRITERION GROUPS 


To appreciate fully the value of the scores provided by the Strong 
Vocational Interest Tests, we should take careful note of the charac- 
ter of the criterion groups which Dr. Strong has used. 

Occupational Criterion Groups. We have already indicated that 


representatives of different occupational groups were in turn, and 
one by one, compared with a men- 


the case of each occupational group, the individuals were successful 
practitioners in their particular line 


of work. Criteria of success were 
generally some years of experience in the business, an average annual 
income in excess of $2,500 (these 


tion in terms of membershi 


(or women-) in-general group. In 


recogni- 


average number of subjects in each 


21, but this varies from a minimum 
The average number of cases in the 
but this also varies from a minimum 
5. Tables 10 and 11 show the total 
iving the scales, the number of sub- 


the average age of tl bi an 
age ag he subjects, @ 
the average school grade completed by the members of each criterion 
group. 


* s 
In the course of his research, Strong ba 
general groups. The one used in thé 


l and businessmen earning 
war standards), Each of these 106 group’ 
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Taste 10. Number of Cases, Average Age, and Average Educational Status of Men 
in Occupational Criterion Groups* 


r Standardiza- | Norm š š 
Occupation tion group | group Age | Education 
PLC COT BEI Een a cond: na ane oema aei Daanin 338 345 37.4 12.3 
Advertising man. wid 230 168 37.6 14.0 
r o T ash Sarkis a 2H 241 42.8 14.4 
ae PE E N EE ETE 278 231 42.7 11.9 
Author-jou 230 249 45.0 14.3 
Aviator.. 510 510 
Banker.. 250 247 45.5 12.2 
Carpenter... 185 181 43,2 12.2 
Certified public « 423 354 37.3 14.3 
Certified public accounta ait, senior àd 6ll nee 37.7 14.4 
Ghemists 0 wostcrg a6 ak 28 waaa aa 293 297 35.2 16.8 
City school sup 190 190 46.5 16.9 
Coast guard 256 256 
Dentist... 249 239 42.4 14.9 
Engineer. 513 513 43.9 15.4 
Farmer.... 245 241 37.6 14.6 
Forest service. . 410 405 38.5 14.2 
Lay et eaea 324 251 39.2 17.0 
Life insurance salesman. 596 315 39.9 13.6 
Mathematician 181 181 46.1 18.8 
* Mathematics-science teacher i 228 237 33.6 16.4 
Minis CE Fionn OPE E PENATEN 255 250 42.6 18.2 
Mortittars 0: ccnsnetineunatariiempalnanaess 360 sige 44.7 13.0 
Musician... = 250 250 32.6 12.4 
Office worker. w 326 317 33.2 11.5 
OSEEG RAEN E on 585 585 37.9 
Personnel manager........+.++++ rd 147 146 41.0 14.7 
Pharmacist... s 315 aag 41.2 
Physician.. 432 337 40.9 18.5 
Policeman 259 254 34.8 10.4 
President of manufacturing concern. ja 172 169 48.0 Sal 
EES i cuwae tua P 258 279 35.5 10.8 
Production manager i 218 216 42.8 13.3 
Psychologist......... te 1048 44.0 
Public administrator ae 515 515 
Purchasing agent... n.. 00100 sis as 221 219 39.8 11.8 
Real estate salesman.......... A 246 243 40.1 12.1 
Sales manager......- 223 228 42.2 13.0 
Social science teacher. 224 217 33.7 16.4 
Veterinarian 310 tes 44.4 
Y.M.C.A. physical directo 220 +j 215 31.4 14.0 
Y.M.C.A, secretary....-.- see 113 113 42.0 14.4 
Mea tice sass saia na anaga aie 6 T a 321 276 40.1 14.2 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, 
Calif.: Stanford University Press, 1943. 
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is represented in the men-in-general group in the same proportion 
that it was represented in the United States census. 


Strong had available differing numbers of test blanks for each of 

d 2 jo; $ 4 . 
the 106 occupational groups. Therefore, since the occupations were 
to be weighted in proportion to their relative numbers in the census 


Taste 11. Number of Cases, Average Age, and Average Educational Status of Women 
in Occupational Criterion Groups* 


Occupation | Number| Age | Education 
BSUS in ss cue nmavntoranenens st Bp Hi sk A Be Be HE BS HO ta ns ce ase E 402 | 43.4 14.3 
Author..... | 402 | 39.9 14.8 
n ETE E O. PPPE E A O 7 204 34.4 12.4 
Dentist... A 195 40.0 16.0 
THEBAA. a wis ae srei ten ic th cas MU ei dw ag Sus en He tyne 416 34.0 16.2 
Housewife... aiani ors Se oe 3b Sh Rh A ee eRe RS ERPS 1256 38.2 1225 
Laboratory technician... «-| 356 33.5 15.1 
IAW YEN searing ie a. Soren assis cis a nos mle neowi.atdiattaceescarees 373 38.0 16.1 
TB rer AND san sin. ae ao ETR ansa AEE an 425 44.0 16.1 
Life insurance saleswoman. 205 46.0 13.5 
TSN ng ae ay 1 se Sanson nero me xe | 396 | 34.0 13.2 
Occupational therapist. . al, 162 34.1 14.5 
O MEO WORKER ariano A oa Sh se ae SRS Gs Sx za aw ce rad 226 33.6 12.3 
PYETET thas aay EESTE AA oh A Sk oh aE HE cinerea 400 41.0 19.0 
Paychiologistion ia ca asasena vn xe o {380 37.4 18.4 
Social MARKET: sis av a seas an am 4 2 04 Hs Cs saj 432 | 38.0 16.3 
Steonographer-secretary....... 0.0.0... cece. vee] 298 29.3 12.5 
Teacher of elementary school... .. 238 36.0 19:9 
Teacher of English in high school... UU 293 41.0 16.6 
Teacher of home economics. , 420 35.6 16.5 
Teacher of mathematics and Physical science in high school 467 39.0 16.7 
Teacher of physical education in high school... . 250 33.4 16.5 
Teacher of social science in high school... .. , 396 35.0 16.6 
Y.W.C.A. secretary... 202 | 45.2 15.5 
CR a in en aw os emer 366 Or 7 15.3 


*From Strong, E. K., Jr. Manual for V 


* Fror r ocational Interest Blank W . Stanford 
University, Calif.: Stanford University Press, 1947, iia 
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were taken as the responses of the one architect to be included in the 
men-in-general group. The quota for teacher was 6. The number of 
blanks available was 282. Therefore, the averages were based upon 
the 282 cases, and these averages were multiplied by 6 to give them 
their proportionate quota weight in the men-in-general group. All 
of Strong’s currently published scales, unless otherwise specified, 
make use of this men-in-general group as a basic point of reference. 

When Strong first developed his interest test, he used as a men-in- 
general group all men for whom he happened to have records avail- 
able. But when revising his test in 1938, he decided to secure a 
men-in-general group truly representative according to the United 
States census. Strong secured such a group, used it in revising the 
occupational scales, and found such remarkably high intercorrela- 
tions among them that they appeared to be practically useless. This 
proved, at the time, a most disturbing finding, but later it became 
apparent to Strong that a men-in-general group according to the 
census represents a group much lower in the occupational hierarchy 
than the original men-in-general group composed, as it was, of 
business and professional men. These groups of business and pro- 
fessional men can be differentiated from one another when compared 
among themselves. But they have much in common when contrasted 
with a group far removed from them in the occupational hierarchy. 

To illustrate this problem a little more concretely, we may cite 
the experience we all feel in viewing for the first time the members of 
a foreign nationality group. To most Occidentals, all Chinese look 
pretty much alike, and they do, when contrasted with Occidentals. 
But Chinese do differ from each other if we compare them, not with 
Occidentals, but with each other. Thus, if we are interested in prov- 
ing how much Chinese look alike we should choose a point of refer- 
ence far removed from the central tendency of the group. We do this 
by choosing Occidentals as our point of reference. But, if we want 
to prove how different Chinese are from one another, we choose a 
point of reference representing the central tendency of the group, 
namely, the Chinese themselves. 

Strong’s original men-in-general group constituted a much more 
appropriate point of reference than did the one prepared according 
to the United States census. When Strong had solved this problem, 
he prepared the more adequate professional men-in-general group 


already described. 
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NONOCCUPATIONAL SCALES 


We have already indicated that in addition to the occupational 
scores, the Strong Vocational Interest Test yields scores for interest 
maturity, masculinity-femininity, and occupational level. 

Interest Maturity. This scale gives an indication of the extent to 
which a person’s interests are like those of 25-year-old men in con- 
trast with those of 15-year-old boys. It was developed to show how 
interests, in a quantitative sense, can change with age, and to aid 
in the classification of occupations in such a way as to show whether 
a high or low degree of interest maturity is appropriate for entry 
therein. 

If we are to use expressed interests as one of the bases for educa- 
tional and vocational guidance, it is important that we know some- 
thing about how interests change with age. Is there perhaps no 
change with age? Is there a consistent increase (or decrease) in 
liking or disliking certain items? Is there at one time an increase (or 
decrease) and at another time a decrease (or increase) in liking or 
disliking certain items? And, whatever the answers, what signif- 
icance do they have for vocational and educational guidance? 

Strong reports that “liking for approximately two-fifths of the 
items in the Vocational Interest Test increases or decreases in a 
straight line from 15 to 55 years of age; liking for two-fifths of the 
items increases (or decreases) from 15 to 25 
decreases (or increases) from 25 to 55 ye 
remaining items differ with different groups of men.” Anticipating 
just a little, the interest-maturity scale shows that interests, on @ 
quantitative basis, tend to change in a consistent direction over the 
age period 15 to 25 and that most of the change which is to occur 
over the age span 15 to 55 takes place by age 25. Therefore, it is 
possible to use the interest-maturity score to show how mature or 
how immature a Pperson’s interests may be. The interests of a person 


A a high score on the interest-maturity scale are subject to rela- 
tively “al future change, Therefore, his expressed interests provide 
n T stable je adequate basis for guidance than do the 
sed interests of someone with a | i 
s ow score t- 
maturity scale. air 


years of age and then 
ars; and liking for the 


ka can use the interest-maturity score in two ways: to tell how 
a person's interests are to maturity and to tell, in connection 
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with his occupational scores, which occupations he should and 
should not consider. A typical 15-year-old boy will secure a fairly 
low score on the interest-maturity scale. It can be expected, there- 
fore, that his interests will be subject to considerable future change. 
Therefore, not much reliance, for vocational guidance purposes, 
should be placed upon his pattern of interests. A 15-year-old 
boy who secures a high score on the interest-maturity scale has 
already acquired a mature interest pattern, and this pattern can 
be used, with some degree of confidence, as a basis for vocational 
guidance. 

_ During the course of his research on interest maturity Strong has 
developed three interest-maturity scales: the first contrasts the 
interests of 15-year-old boys and 55-year-old men; a second con- 
trasts the interests of 25-year-old men and 55-year-old men; and a 
third contrasts the interests of 15-year-old boys and 25-year-old 
men. Strong finds the intercorrelations among these scales to be as 


follows: 


1, (15-25) vs. (15-55)... .0.ar- E 
2. . —.41 
P = 03 


These correlations show that most of the change in interest be- 
tween ages 15 and 55 takes place between ages 15 and 25; and that 
the direction of the change which takes place between ages 25 and 
55 tends to be opposite that which takes place between ages 15 and 
25. Since vocational guidance, if it is to serve its primary purpose, 
must be given much nearer the age of 15 than 25, the changes which 
may take place over this ten-year period of time are much more 
important than the changes which take place thereafter. Therefore, 
the most useful of the three scales, and the only one now published, 
is that contrasting the interests of 15-year-old boys and 25-year-old 
men. The 15-year-old group consists of 472 boys fairly representative 
of the California high-school population of 1933, and the 25-year-old 
group consists of 215 mèn representing the occupational pattern in 
the United States census. 

Masculinity-Femininity. This scale shows the extent to which a 
person’s interests correspond to those of men in contrast with those 
of women. Strong has found it helpful to consider the score on this 
scale along with a person’s occupational scores to show whether he 
should consider entering an occupation characterized by more 
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masculine interests (say, engineering) or an occupation characterized 
by more feminine interests (say, journalism). E EO 

Altogether, Strong has developed seven masculinity-femininity 
scales. One of these, the one used chiefly in connection with the 
women’s interest blank, is based upon a comparison of men’s and 
women’s answers to the 263 items common to the original women’s 
blank and the revised men’s blank, and need not be discussed fur- 
ther. The other six scales, all developed in connection with the men’s 
interest blank, contrast the responses of various groups of men and 
women on all 400 items of the form. In these six scales the groups of 
subjects represented are 114 high-school boys and 114 high-school 
girls, 154 college men and 154 college women, and 335 adult men and 
335 adult women. Strong developed separate scales for each of these 
groups as well as three scales based upon all groups. These latter 
three scales differ from each other only in the ra 
represented: (1) +15; (2) +3; and (3) +4. 

To develop these scales Strong followed the item-weighting pro- 
cedure described on page 37. In this instance, there had to be 
computed the percentage of men and the percentage of women who 
responded L, I, and D to each item, the differences between these 
percentages, and the item weights. These procedures were repeated, 


of course, for each of the seven scales. In developing the scales based 
upon all subjects, the percentages for the high-school group, college 
group, and adult group were averaged. This gave each group equal 
weight in contributing to the composite or average scales. 

The intercorrelations among the various M-F scales are not per- 
fect, but they are high. The high-school and adult scales correlate 
-90; the high-school and college scales correlate ‘81; and the college 
Strong reports that all three scales 
the average scale h 
> 96; college scal 
shall no 


nge of item weights 


and adult scales correlate 74. 
correlate .90 or over with 
+15 (high-school scale 


For most purposes we 


r 12. It shows the standar 
and the distribution standa 
and adult groups. The differ, 


chool, college, 
scores of men 
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and women are obviously large and the critical ratios given in column 
6 demonstrate that they are of non-chance significance. 


Tapie 12. Means and Standard Deviations for Men and Women on the 
Masculinity-Femininity Scale* 


| Men Warmen a, 
Group N = | - | pica 
Mean SU | SD | 
= ae — 
High school.........., 129 | 52.3 8.2 | 30.1 | 7.9 | 18.7 
College ef 143 | 49.5 | 10.8 | 26.6 | 7.6 | 15.9 
AAUIG, msa 100 | 47.7 | 10.2 | 26.8 | 7.9 | 17.3 _ 
a NAESER m | 50.0 | 10.0 | 27.9 | 7.9 | 31.2 


* From Strong, FE. K., Jr. Vocational Interests of Men and Women, Stanford University, 
Calif.: Stanford University Press, 1943. 

Strong has determined the relation between masculinity-femin- 
inity and occupational interests in three ways: in terms of mean 
M-F scores for different occupational groups, in terms of the cor- 
relations between M-F scores and occupational scores, and in terms 
of the critical ratios of the sex differences in mean occupational 
Scores for contrasting groups of men and women. The results of these 
three methods (each method being applied to a different set of data) 
are given in Table 13. The rank-order correlation among the classi- 
fications is exceedingly high, averaging .90 for all combinations 
among the three methods of classification. We may cite as examples 
of masculine occupations engineer, farmer, purchasing agent, chem- 
ist, and dentist. All these are included in the one-third most mascu- 
line groups in all three classifications. Occupations consistently 
classified in the one-third most feminine groups on all three bases 
are musician, artist, advertising man, journalist, minister, and life 
Insurance salesman. Occupations consistently classified as neutral 
(neither masculine nor feminine) are physician, psychologist, mathe- 
matician, Y.M.C.A. physical director, realtor, certified public 
accountant, and architect. , 

The masculinity-femininity score can be used in conjunction with 
the occupational scores to help a person decide upon the suitability 
of a given occupation. A person who gets a high score as lawyer can 
have added confidence that this is the right occupation if he also 
gets a feminine score on the M-F scale; and a person who gets a 
high score as an engineer can have greater confidence that this is the 
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right occupation for him if he also gets a high masculine score on the 
M-F scale. 


In using the M-F scale it is important to know whether the scores 
on it vary with, or are independent of, age. Strong finds only a slight 


Taste 13. Classification of Occupations According to Masculinity-Femininity 


Critical ratio basis Mean score basis Correlation basis 
BAGINCER. apurinsa 8.8) Engineer..........,,., S Parmer. a jor cow ces cow + 
PRIN CE croix sconce 7.0| Chemist.. Engincer...........005 
Purchasing agen 6.4] Teacher of mathematics Purchasing agent 
Chemist........ 6.3} and physical science... .|50 Chemists saisi zgi 
Accountant... . AS RATNE; za ga onto an sts inl 50| Physicist... . 

PHY SICSE sie a israe 3.8) Purchasing agent... ... 49) Accountant 
DSH ESB A, ly Demtistssieisicsin-ciaan en on 48| Office man. ........... 
1.8) Personnel manager... ., 47) Dentist... 
0.8) Office man............ 46) Mathematician 
0.5} Certified public account- Y.M.C.A. phy 
OH settles arend i. ae cna 
Mathematician... --| 0.3| Accountant, -|46| Physician. , 
Y.M.C.A. physical direc- Psychologist, . -|45| Personnel m: nage 
an E PE oh 0.4| Mathematician, .. -|45) Architect............. 
Realit resso Realtor Psychologist........... 
Y.M.C.A, secretary, Realtor 
Teacher 


= 09 
Y.M ecretary = 
3| Y.M.C.A, secretary... 
Teacher. ... — 40 
Musician i A 
3| Artist, . me 
ent -7| Teacher of social science 42) Life insurance sales — 49 
Life Jnsurance salesman, .|—3 4 Life insurance salesman |42| City school superintend- | 
Minister... 3 ent... Agee 
Journalist... ce i Minister Say | 56 
Advertising man.. -1| Advertising man... __ K Lawyer cane. —.62 
Artist... Soh MIER s aa 37 Journalist aii — 64% 
Musician... , Binani as —5.6| Artist... a3 5371 Advertising ma = Tt 
Journalis -|36 
z $ 
due eae sree ii my Interests of Men and IW, omen. Stanford University» 


relation to age, but the older a Person gets the more feminine his% 
interests tend to become. This is true of both men and women. There 
is always a large gap between men and women of com arable ages» i 
but at the older ages the differences tend to be less than = the earlier | 
ages. b 
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Occupational Level. This scale gives an indication of the extent 
to which a person’s interests are like those of professional men in 
contrast with those of unskilled workers. It was constructed by 
contrasting the responses of a group of 4,746 professional and busi- 
nessmen with those of a group of 258 unskilled workers. Therefore, 
if a person secures a high score on the occupational-level scale, he 
should consider some activity at the business or professional level. 
If he secures a low score he should consider seriously such occupa- 
tions as policeman, carpenter, printer, etc., 7.e., those at the skilled, 
semiskilled, or unskilled level. 

The occupational-level scale can also be used as an aid in deciding 
whether or not the score on any given occupational scale is appro- 
priate. For example, if a person gets a high score as lawyer and a 
high score on the occupational-level scale, he can have more con- 
fidence in the score as lawyer than if he had received a low score on 
the occupational-level scale. Also, if a person receives a high score as 
carpenter and a low score on occupational level, he can be more 
certain that this occupation is appropriate than if he had secured a 
high score on occupational level. 


INTERCORRELATIONS AMONG SCALES 


It will have occurred to the reader that occupational groups such 
as physicists and chemists will have many more interests in common 
than, let us say, artists and certified public accountants. Therefore, 
we are led to ask whether it is possible to make any meaningful 
classifications of the occupations for which scales have been devel- 
oped. If so, the results can be telescoped into useful patterns. In 
other words, in interpreting the results on the Strong Vocational 
Interest Test, must we concern ourselves with each and every one of 
the occupational scores and with the three special scales, interest 
maturity, masculinity-femininity, and occupational level? Or is 
there some useful, economic, and meaningful method of grouping 
the scores to reduce the number of apparently separate and discrete 
Score entities? 

Strong has experimented with three methods of classification: 
upon the basis of the three special scales, upon the basis of factor 
“analysis, and upon the basis of a trial-and-error empirical grouping 
based upon the scale intercorrelations. The last two methods give 


5 soe 7 ee : elli re; * 
Tase l4. Occupational Interest in Relation to Interest Maturity, Masculinity-Femininity, Occupational Level, and Intelligence 


Standard scores Correlations 
Group | o " pe = 
Ros ceupation Interest | Masculinity-| Occupa- Interest | Masculinity-| Occupa- Intelligence 
maturity | femininity | tional level | maturity femininity | tional level 
1 Artist 46.2 33.0 | 58.9 = 33 —.44 18 1S 
| Psychologist 51.6 47.9 | 60.9 —.14 Sli = 1 38 
| Architect 50.7 | 43.8 | 61.0 — .46 sail = 303 23 
| Physician 49:7 | 464 | 63 = — .06 .03 24 
Dentist 51.8 eo | y7 —.39 J8 = J4 07 
BY 
2 | Mathematician 49.4 47.8 as | = ay | aT =e 35 Š 
| Physicist 47.8 55.7 61.0 | =.51 32 = 17 34 RI 
| Engineer 51.6 61.9 61.4 -H 64 = 70) 28 S 
Chemist 50.6 57.1 60.0 | —.38 4 | —.28 BS = 
| | | = 
3 Production manager 53.0 | 59.1 60.2 | 01 79 —.23 .04 8 
4 Aviator 50.7 58.2 =h -76 —.59 2 = 
Farmer 50.2 51.2 | —,29 68 —.62 06 = 
| Carpenter 51.3 58.6 —.14 63 —.72 | =02 = 
| Printer 53.4 47.3 03 37 — .&2 12 
| Mathematics science teacher 55.1 50.3 3 49 —.72 08 
| Policeman 53.8 51.2 | 27 a57 -.77 -.13 
| Forest service 52.5 52.2 08 a) —.62 — .03 
5 | Y.M.C.A. physical director 56.2 47.1 55.8 67 —.03 -49 = 18 
Personnel manager 56.5 50.7 61.4 tS — 06 — 10 —.02 
| Y.M.C.A. secretary 58.7 40.0 59.4 84 —.34 —.18 -I$ 
| Social science teacher 57.0 42.9 36.1 75 = S l =a 
| City school superintendent 56.5 44.6 63.4 63 = 51 .10 — .06 


a e — 
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Grou : 
P Occupation 
no. 
6 Minister 
7 Musician 
8 Certified public accountant 
9 | Accountant 
Office worker 
Purchasing agent 
| Banker 
10 | Sales manager 
Real estate salesman 
| Life insurance salesman 
11 Advertising man 


Lawyer 
Author-journalist 
President of manufacturing concern 


(Continued) 
Standard scores | Correlations 
Interest | Masculinity-| Occupa- | Interest | Masculinity-, Occupa- | eal eves 
maturity femininity | tional level | maturity | femininity | tional level ee wens 
57.3 35.1 58.8 + 56 —.14 | 02 
52.8 40.6 53.8 04 AL | — .42 02 
| | | 
539 46.4 63.4 .09 | 29 | 43 22 
55.4 59.5 Ai 32 | —.26 | —.10 
54.6 57.0 | 61 | 16 | —.33 | —.25 
53.9 0 .03 460 | W | 21 
53.4 58.1 | a o | 05 | =.33 
| 
54.3 51.8 63.3 21 s | 42 | —.23 
51.9 47.3 60.4 —.02 | -28 | 41 —.22 
53.8 42.4 62.3 2 .49 AT —.26 
52.8 39.0 | 63.8 —.08 | 74 52 01 
52.4 47.0 | 64.4 —.15 | 62 | -60 Be is} 
47.5 31.8 | 63.0 —.45 | 66 | 46 18 
52.8 51.7 | 63.4 —.32 03 | 63 —.03 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, Calif.: Stanford University Press, 1943. 
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essentially similar results, but Strong has found it helpful to considey 
the results of both methods in relation to each other. Table 14 shows 
the results of these classifications and the correlations of the occupa- 
tional scales with intelligence and with the three special scales, 
interest maturity, masculinity-femininity, and occupational level. 
This table is useful in showing that scores on occupations such as 
artist, psychologist, architect, physician, and dentist tend to be 
highly related to each other. Therefore 
dentistry, let us say, and 
in Group 1, as well as a 
additional confidence in h 
if he secures high scores, 1 


» if a student is interested in 
if he gets high scores in other occupations 
high score in dentistry itself, he can have 
is intended choice of a vocation. However, 
not on the scales in the same group but, let 
us say, on the scales in Group 8, purchasing agent, office worker, 


accountant, and banker, these will call into question the advisability 
of his going into dentistry. 


GROUP SCALES 
To facilitate econom 
occupational classification, Stron 


Group I. Artist, 
Group IT. Engin 
Group V., Y.M. 


psychologist, architect, Physician, dentist 
eer, chemist 
ager, Y.M.C.A. secretary? 


dent, minister 
countant, office worker, purchasing agent, banker 
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PERMANENCE OF INTERESTS 


One of the important considerations in the use of an interest test 
for educational or vocational guidance is the permanence of the 
psychological variables measured by the test. If interests fluctuate 
widely from one period of time to another, there would be little 
likelihood that we could use interests at one time as a proper guide 
for those at another time. Strong has devoted considerable attention, 
therefore, to a study of the permanence of the interests measured 
by the Vocational Interest Test. He has measured this permanence 
of interests in four ways: in terms of the correlation between two 
series of scores, the correlation between two profiles, the comparison 
of mean scores, and the comparison of ratings before and after specified 
periods of time. We shall discuss briefly the results secured from each 
of these methods, although it is obvious that they are not inde- 
pendent of one another. 

Correlational Data. For 29 occupational scales Strong finds an 
average correlation of .80 for a group of college freshmen tested one 
year after they first took the test, an average correlation of .79 
three years later, and an average correlation of .56 nine years later. 
For a group of college seniors, Strong finds an average correlation 
of .75 with scores five years later and an average correlation of .71 
with scores ten years later. Except for the correlation of .56 for the 
college freshmen, these all seem like highly respectable indications 
of permanence of interests. In extenuation of the correlation of .56, 
we might point out that this could reflect the initial younger age of 
this group upon its first testing. Interests do tend to change with 
age, and the younger the person upon the first test, the greater the 
likelihood of change thereafter. College seniors are much nearer 25, 
the age after which no appreciable change, or at least relatively 
little change, in interests will take place. 

Profile Data. In this instance, for each of 50 individuals in three 
groups of subjects (freshmen, seniors, and graduate students), 
Strong computed the correlations between 34 scores on an original 
test with those secured on retests five, ten, and twenty-two years 
later. Thus each correlation to be mentioned represents profile 
consistency over each of the periods of time in question. The median 
correlations which Strong reports are .84 for a five-year interval, 
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.82 for a ten-year interval, and .76 for a twenty-two-year interval. 
We can certainly agree “that the chances are very good that those 
who had interests most similar to engineers, lawyers, or ministers 
while in college will have similar scores twenty years later. 
Mean-score Data. Strong gives data for 18 occupational scales for 
168 college seniors tested in 1927, 1932, and 1937. He finds only 
seven statistically significant changes from 1927 to 1932 and two such 
changes from 1932 to 1937. Between 1927 and 1937 he finds eight 
statistically significant changes. In 


most of the comparisons the 
mean scores increased but o 


nly by a very slight amount. none: 
concludes that if changes due to increasing interest maturity coul 


be subtracted from the mean-score changes, there would be very 

little change left to be accounted for. The average score of 95 seniors 

before and after entering an occupation and remaining in it for te! 
5 g 


years changed only from 46.2 to 46.9. It is evi 
mean scores interests may 


Rating Data. The exten 


dent that in terms of 
be considered fairl y stable and permanent. 
t to which a person taking the Vocational 
Interest Test at one time may expect, upon a second testing, to get 
the same rating or a rating one or two steps removed from the 
original rating is set forth in Table 15, Strong concludes, “Even 

TABLE 15, Comparison 


of College Seniors on Two Different Testings* 


Received Recei we 
Ee fe ae identica 
Time Received tds need ids es or 
Group interval, identical ratings or ee one 
ae ey atings 
years ratings THRNBS ee steps 
| one step | or two stef 
: removed removed 
High school juniors, 6 40.55 72 ge } p 5% 
College freshmen, ,, 1 52.0 i = 2 r 95 9 
` Pei . > 
College seniors... 5 45.9 7 & 93.2 
College seniors... , | 10 40 7 A rs 4 
College seniors, 5 ears late | 5 Pi 
g » Sy ater 5 | 48.3 82.0 95.6 
* From Strong, F. K., Jr. Vocational Interests a 


| 
n 3 Trapi versity? 
Calif.: Stanford University Press, 1943, Y Men and Women, Stanford Univers! 


after ten years, it appears there is only one ch 


with college seniors that an A or Cc diagnosis is 


5 d 
ance in one hundre 
A becomes a C or vice-versa.” 


: be i 
Incorrect, that is, 4 
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PREDICTION OF SUCCESS 


We can now turn our attention to the extent to which the Voca- 
tional Interest Test is useful in differentiating superior and inferior 
students, and superior and inferior members of various occupational 
groups. 

Academic Success. We have available for the study of this prob- 
lem scales designed to differentiate between occupations, between 
courses of study, and between superior and inferior students. 

Occupational Scales. The occupational scales yield very low cor- 
relations with academic achievement. The highest correlation which 
Strong reports is one of .3+ between the engineering scale and grades 
in engineering. The Vocational Interest Test does poorly, therefore, 
what an intelligence test or scholastic aptitude test can do much 
better. Strong reports a study by Segel, however, in which Segel 
correlated the occupational scales with “ (4) the differences between 
grades in two school subjects and (4) the differences between achieve- 
ment scores in two educational subjects.” In this study, Segel found 
a correlation of .61 between the engineering scale and the differences 
in grades for mathematics and science-history. He found a correla- 
tion of .57 with the differences in scores between a mathematics and 
a history-social-science test. These are higher than those found to 
obtain between the engineering scale and grades or test scores. 
Strong’s explanation is that the difference in achievement in two 
school subjects represents the residue to be explained after the 
ability or intelligence factor is eliminated from consideration. In 
other words, ability is required for achievement in any school sub- 
Ject. But the procedure of subtracting the grade in one subject 
from that in another cancels ability as a factor. Therefore, it is 
reasonable to suppose that the difference in grades is due to some- 
thing other than ability, and part of this other factor could easily 
bea person’s interests. 

Scales Differentiating Courses of Study. We can summarize the 
work in this area by quoting Strong directly: 


+. . it is possible to differentiate students in terms of major courses of study on 
the basis of interests in the same way that men are differentiated with respect to 
Occupations. So far this has not been done in a thorough going manner nor with as 
high a degree of differentiation. Three explanations may be advanced: First, scales 
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have not been based in most cases on large enough samples. Second, the criteria 
are not as good as with occupational scales. . . . Third, the interests of students are 
. less stable than the interests of adults who are well established in their occupation. 


Superior and Inferior Students. The most significant study in this 
area is that centering around the work of Young and Estabrooks and 
their studiousness scale. They constructed this scale by proceeding 
through the following steps: 


1. They computed the correlation between intelligence and grades. 
2. Upon the basis of the regression equation which they obtained, they made 


predictions from intelligence as to what grades 588 students, individually, should 
obtain. 


3. They subtracted the predicted grade from the obtained grade to get a residual 
score. This they interpreted as a measure of studiousness. 

4. They selected as two criterion groups the 100 students with the highest 
studiousness scores and the 100 students with the lowest studiousness scores. 

5. They determined the percentage of each of these groups that gave the alterna- 


tive answers L, I, and D to each item, and from the differences obtained they 
constructed their studiousness scale. 


This scale was found to correlate approximately .33 with college 
grades at Colgate University. For the same students, intelligence- 
test scores correlated .45 with grades. Using both intelligence and 
the studiousness scale a multiple R, predicting grades, of .56 was 
obtained. At the University of Florida, Mosier was able to verify 
these results for liberal-arts students but not for technical or busi- 
ness students. At the University of Minnesota, Williamson, who ° 
made still another check of the results, found that the studiousness 
scale correlated .20 with grades for liberal-arts students. This did 
not turn out to be a very useful relationship, however. When com- 
bined with the predictions made from the American Council on 
Education Psychological Examination, the multiple correlation 
predicting grades was found to be .48, an increase of only .03 over 
what the ACE alone was capable of doing. 

Vocational Success. This problem can be attacked by using the 
occupational scales or by developing new scales designed to differ- 
entiate between superior and inferior members of a specific occupa- 
tion. We shall review the evidence available from both these lines of 
study. 

Occupational Scales. Strong presents more complete data for the 
life insurance scale than for any other. Table 16 shows the relation 
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between the scores on the life insurance scale and the average annual 
production of 211 life insurance agents. This table shows, among 
other trends, that 56 per cent of those with A ratings, but only 6 
per cent of those with C ratings, achieved an average annual pro- 
duction of $150,000 or more. Another way of seeing the relationship 
is to note that 52 per cent of those receiving C ratings produced less 


Taste 16. Scores on the Life Insurance Scale and Average Annual Production 
of Life Insurance* 


Percentage receiving rating 
Production N 

c |B—| B | B+] A 
$400,000 and up... o.e 6 0 0 5 4 3 
$200,000 to $399,000. . 47 o | i 0 | 20 31 
150,000 to 199,000. . 37 6 | 17 9 | 13 22 
100,000 to 149,000. . 31 | 18 | 17 | 14 7 19 
50,000 to 99,000. . 52 | 24 | 17 | 45 | 34 16 
Oto 49,000 38 | s2 | 32 | 27 | 22 | 9 
Total number... s.s eetet a1 | 17 6 | 22 | 45 | 121 


* 5 
From Strong, E. K., Jr- Vocational Interests of Men and Women. Stanford University, 


Calif.: 
alif.: Stanford University Press, 1943. 


than $50,000 per year, that 24 per cent produced from $50,000 to 
$99,000 per year, that 18 per cent produced $100,000 to $149,000 per 
Year, and, as we noted above, that only 6 per cent produced $150,000 
OF more per year. The reader can examine other columns or rows of . 
the table for additional confirmation of the relationship between the 
Sst Scores and production. We can summarize the over-all relation- 
efficient of .37. This 
y many academicians, but it is 
degree of relationship between 
insurance business. 


Marion A. Bills. She found 


Fhe, data are presented in Table 17. Reading across the top row of 
ra table, we find a steady decline, 
Ores, in the percentage of cases rate 


Stand} 
of nding successes.” Conversely, as we pr 
the table, from high to low interest scores, We 
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crease in the percentage of cases rated “failures” by their managers. 
The over-all relationship is expressed in a coefficient of only .25 but 
again this represents a significant relationship. There certainly is 
little question that in the fields of life and casualty insurance there 


Tasie 17. Scores on the Life Insurance and Realtor Scales and the Production of 
Casualty Insurance* 


Percentage of cases who received designated ratings 
| 


Ratings | we | 4 43 3 | 
+6 | +5 —2 =5 —6 
ee | ER | | lass 
Outstanding success................. 235 | 16 11 8 4 
Success 53 | 56 | 47 39 20 
ALAR) a a R ae | 22 28 | 42 53 | 76 
Totalknurabets as os 6 sa tase sean | 13 130 193 a | 55 


* From Bills, M. A. Relation of scores in Strong’s interest analysis blanks to success in selling 
casualty insurance. J. Appl. Psychol., 1938, 22, 97-104. 


is a useful relationship between the scores on the life insurance 
interest scale and success in selling. This shows, as we remarked a 
few pages earlier, that the differences between life insurance sales- 
men and men-in-general are related positively to the differences 


which distinguish the successful from the unsuccessful life insurance 
salesman. 


Taste 18. Occupational Interest Scores in Relation to Supervisors’ 


Ratings* 
Seale r 

Canana -341 
Engineer. aiaee -307 
Certified public accountant, . .253 
SBCA TR O 139 
Personnel manager ct 6 
BOCOR o wo ES ce. m 034 

-. 009 
Life insurance salesman. ........,... — .308 


* From Strong, E. K., Jr. Vocational Intere. 
Calif.: Stanford University Press, 1943, 


sts of Men and Women. Stanford University, 

There have been reported entirely too few studies on the relation- 
ship of the occupational interest scores to success in fields other than 
life insurance. One such study, however, resulted in the correlations 
reported in Table 18. These show the relationships for 59 foremen 
between their scores on ratings by their supervisors and their own 
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scores on several of the occupational scales. Since these foremen were 
employed in chemical and engineering plants, the correlations seem 
appropriate. 

Superior and Inferior Members of an Occupation. Strong reports 
only one study in this connection. It consisted in an attempt to 
differentiate successful from unsuccessful aviators. Strong used as his 
successful aviators 101 Army, 71 Navy, and 215 transport pilots 
and 125 Civilian Aeronautics Authority instructors, and contrasted 
these with 173 “failures.” These “failures” consisted “of 65 naval 
trainees and 32 Civilian Aeronautics Authority trainees who failed 
their preliminary course and 76 men rated the ‘poorest in my sec- 
tion.” The results achieved with this scale, as well as with the 
aviation interest scale itself, proved ineffective in differentiating 
successful from unsuccessful aviators. Here we find a different result, 
then, than we did in the case of life and casualty insurance salesmen. 
This illustrates the danger of attempting to generalize the results 
which may be secured for one scale to those which may be expected 


for any of the others. 


RELIABILITY, OBJECTIVITY, AND VALIDITY 


d the methods Strong used in the develop- 


Having now describe s c 
having mentioned something 


ment of the Vocational Interest Test, l 
about the nature of each scale, and having given some indication 
of the uses to which these scales may be put, we must now inquire 
into the data pertinent to a determination of their reliability, objec- 
tivity, and validity. These data will show how effectively the Voca- 
tional Interest Test measures the area of personality it was designed 


to explore. f 
Reliability. Table 19 shows the Spearman-Brown coefficients 
le gives data for 35 of the men’s 


which Strong has reported. This tab 4 : 
Occupational scales and for 18 of the women's occupational scales. 


It gives data for 6 of the group scales and for the nonoccupational 
Scales: interest maturity, masculinity-femininity, and occupational 
level. The coefficients for men are based upon the records of 285 

tanford seniors, and the coefficients for women are based upon the 
records of 500 pantie women. None of these cases was included in 
the original criterion groups. The lowest of the coefficients reported 
ts that of .73 for certified public accountant, and the next two lowest 
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Taste 19. Reliability Coefficients for the Scales on the Vocational Interest Test* 


Men’s Scales: 


r 
APES tears iniiis e cis sk we ity ote EE -92 
Psychologist. -88 
Architect: rnn -90 
PRYSICRA osre so aa is .89 
Teaia noni auntie at ak ae S 84 
Mathematician 392 
Engineer oe 
ChemiStisursaunee mae E E <1 ae seawies 91 
Production manager............. 85 
Aviator ONEN -90 
Hanae ae cosas orcs 339 95-8. na s\n aca aoe awaiwaasoomnn BEE 
CART Ret N srs wea a 36 7 -90 
Printer... » .80 
Mathematics-science teacher ............... .88 
Poceni si wis Se i oe GHA KE wine woe -83 
Forcat Recs: ar ace en a wae R -88 
Y.M.C.A, physical director.............0..20. „84 
Personnel manager........ .82 
Y.M.C.A. secretary. . -90 
Social science teacher -88 
City school superintendent - 84 
EN enano a mian BE ERAS BEREAN Keee's -90 
Musician.. .87 
Certified public accountant... ..... Ti 
Accoun tent. .sawsswaameys vt . 84 
OFT ce wor a n EE T Ae ate so coe -88 
Purchasing agent... l...a oaoa enea. 85 
IESE Ys a ss mechani AAN vite EE i 83 
Sales manager. ......... e «20 
REAL EOR a is SN = 00 
Life insurance salesman......... a 293 
Advertising man............... v SL 
Lawyer ; 8E 
Author-journalist . 94 
President of manufacturing concern... . -82 
Women’s Scales: 
MATES te aes ADA E Yet ae Seas . 93 
Author... 94 
Dentist..... _ .78 
Office worker _ .92 
Housewife. 90 
TOME ca Set Sates A E Hc. cies clon 81 
DiBrartan cco ca aE senter anim a nra 87 
Life insurance saleswoman............ z aft 
Nutt srneasag E 8% aaa .87 
Physician sasama sop steel tiaina 87 
Social Wor ke tis iv ce ag R E a, -83 
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Taste 19. Reliability Coefficients for the Scales on the Vocational Interest Test* 


(Continued) 

Women’s Scales: r 
Stenographer-secretary....---++--20rrerresrert es 85 
Teacher of English. . .82 
Teacher of mathematics-physical science......--+- .84 
Teacher of social science... -+e o ee eette .86 
Y.W.C.A. secretary... -+ .88 
Physical education teacher..... .86 
Elementary school teacher... --- +++ +5750" .90 

Nonoccupational Scales: 

Interest maturity. . -+e :93 
Occupational level... ie t sorerrrt .88 
Masculinity-femininity (men)... 00s ssrrer .93 
Masculinity-femininity (women) -oieee .74 

Group Scales: 

I, Physician sispa steerer seii Be ang HOF 
II. Chemist. ..- TE TEC 
90 


V. Y.M.C.A. secretary.. teenete 
VIII. AcCOUAANE masi E a rine sia EER 

IX. Life insurance..--- +--+ 

XK. Lawyer. tenanik aek 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, 


Calif.: Stanford University Press, 1943. 


are for life insurance saleswoman and the women’s masculinity- 
femininity scale. Both of these coefficients are .74. The highest 
reliability reported is .94. This obtains for the author scale on the 
women’s blank, for the engineering and author-journalist scale on 
the men’s blank, and for the chemist-engineering scale and selling 
Scale on the group keys. Of the 64 coefficients 26 are .90 or over. 
Objectivity. This term has usually been taken as referring to the 
degree to which a test score may be in error because of variations 
introduced by the person scoring the test. Using the term in this 
traditional sense, we may say that the Vocational Interest Test is 
completely objective. There is 4 fixed and set series of weights to 
© used in scoring, and there is no possibility that variation in 


results can be introduced by the test scorer. (We are ruling out, of 
Course, purely mechanical errors, aS we are assuming that adequate 
the numerical accuracy of all 


P Bvision will be made for checking the nu € 
€st results.) There may be minor variations induced by a test 
examiner, depending upon the tone he sets in giving instructions 


for taking the test. If he implies by his actions that the test results 
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are to be treated lightly, chances are that the subject will not give 
the same care in answering, as otherwise might be the case. 

The situation under, and the purpose for, which the test is given 
would also seem to be a source of uncontrollable variation. A college 
student who really believes that the test will help him make a wise 
choice of a future career will undoubtedly exert more care in giving 
his answers than will another student not so convinced. And both 
of these students, having nothing to gain and much to lose by dis- 
honesty, may give a different set of answers than will the applicant 
who feels that the showing he makes on the test will have some 
bearing upon his being considered favorably for employment. : 

In this latter case, we must recognize three possible sets of condi- 
tions under which the test may be taken: honestly, dishonestly at a 
conscious level, and dishonestly at a subconscious level. Test scores 
secured under one of these conditions may show little relation to 
those secured under either of the other two conditions. We can 
certainly venture the assertion that test scores are most useful, not 
only to a potential employer, but also to an applicant himself, if he 
gives an honest set of answers. 

In the employment situation we must recognize, however, that an 
applicant may put down consciously or unconsciously the answers 
which he feels will get him the job, rather than those which may 
more accurately describe him. A number of investigators have found 
that it is easy, when one is so inclined, to fake the answers on the 
Vocational Interest Test. Strong, himself, asked 22 engineering 
students and 13 business-school students to secure scores as high 
as possible on the engineering interest scale. The engineers, even 
though their original scores on engineering were high, increased their 
average score on the scale by 142 raw-score points. The business- 
school students, whose scores on the engineering interest scale were 
low, raised their average score by 392 raw-score points. 

Steinmetz asked 46 students to secure high scores on the teacher- 
administrator scale and found a mean raw-score increase over their 
original scores of 247 points. This deliberate fudging also caused 
large changes on the scales for Y.M.C.A. secretary, minister, per- 
sonnel manager, and certified public accountant. At the same time, 
it caused a marked decrease in scores for realtor, artist, and farmer. 
There is certainly no question that test scores can be deliberately 
falsified. The only safeguard is to “sell” a person on the value to 
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himself of giving honest answers and of thus getting a report of real 
value. 

Validity. We now come to the most important of the three basic 
requirements for any psychological measuring instrument. Does the 
Strong Vocational Interest Test give results which can be taken as 
valid? There is to this question, as much as we might like it, no 
straightforward and simple answer. We cannot say that the test is 
valid or is not valid without specifically defining the purpose to be 
served or the specific scale in question. Let us review the scales 
we have discussed and see what problems are involved in determin- 
ing their respective validities. 

First, let us consider the occupational scales. Their primary pur- 
pose is to differentiate designated groups of business or professional 
men (or women) from men- (or women-) in-general. They will be 
valid, therefore, if they do this job well but not valid if they do this 
Job poorly. Since we have 39 such scales on the men’s blank and 18 
on the women’s blank, we at once find that we have not one, but 
54 validities to consider. 

i Next, we have the group scales. These scales are 
tinguish certain general groupings of occupations 


Cupations) from men- (or women-) in-general. The validity question 
pecific occupational scales but 


supposed to dis- 
(not specific oc- 


's analogous to that for each of the s 
1s, nevertheless, a different one. And since there are 6 group scales, 
We now find a total of 60 validities that need to be considered. 
Third, we have the nonoccupational scales, interest maturity, 
masculinity-femininity, and occupational level—three more scales 
and three more validity problems. Does the masculinity-femininity 
Scale distinguish between the interests of men and those of women? 


oes the occupational-level scale distinguish between those high 


and low in the occupational hierarchy? Does the interest-maturity 
f older and younger men? 


Scale differentiate between the interests 0 

A fourth type of scale is that designed to differentiate among 
Students engaged in different courses of study, a fifth (not reported 
On in this text) is that designed to differentiate between superior 
and inferior students in specific subjects, and a sixth type is designed 
to differentiate between superior and inferior members of designated 


oce : 

Ccupational groups. 

; Altogether there must be well over 
Onsidered in relation to the purposes 


150 different validities to be 
of the Strong Vocational 
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Interest Test. We cannot say, therefore, that the test as a whole is 
valid or is not valid. It is valid for some purposes to a high degree, 
for other purposes to a moderate degree, and for other purposes it 
possesses no validity whatsoever. 
Occupational Scales. These scales are valid, we said above, if they 
can be shown to differentiate the members of different occupational 
groups from each other. If they did not, the test could obviously 
not be used as a basis for guiding a person toward one occupation or 
line of study rather than toward another. It would be impossible 
for us to review here the data for all scales, but it will be most in- 
structive if we follow Strong’s discussion on the differentiation of 
artists and accountants. Strong finds that accountants secure a 
mean score of 17 on the artist scale and that artists secure a mean 
score of 11 on the accountant scale. Immediately, we see that these 
scales are different. Table 20 shows the mean scores of 19 occupa- 
tional groups on the artist scale, and Table 21 shows the mean 


Taste 20. Differentiation of Artists from 19 Other Occupational Groups by Use of the 
Artist Scale* 


r Critical | Percentage 
Occupation Mean ratio | overlapping 

52 

33 12.8 37 
Mathematician 33 13.3 35 
Musician... 33 11.9 40 
Advertiser. my 30 14.1 32 
Chemistve si i ay an wef 29 15.8 26 
Lh Che a ar af 26 18.2 20 
Minister -.| 26 18.1 20 
Tayta ok ie oy Bes «| 26 18.9 19 
ETO SN ab vu 0 | 24 20.1 15 
CARPENTER: Aenne iaa sie exe vik wand 0h « 24 21.1 13 
President of manufacturi ng concern...| 22 20.1 15 
Certified public accountant.......... 22 21.7 13 
Life insurance salesman............,, 22 22.8 11 
City school superintendent.........., 20 23.4 10 
Production manager... . . +i} 20 24.1 9 
Policeman... 000a 19 23.6 10 
Personnel manager 19 23.9 10 
Bankers sus esoret 17 24.8 8 
Accountant. 17 25.8 7 


* From Strong, E. K., Jr. Vocational Interests of 


Men and Women. Stanford University» 
Calif.: Stanford University Press, 1943. 


E 
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Tase 21. Differentiation of Accountants from 19 Other Occupational Groups by Use 
of the Accountant Scale* 


Occupation Mean Gaticat percentage 
ratio | overlapping 

ACCOUNEANE, aaeain anii Ba Oe Di a ime 49 

Certified public accountant.....---+- 43 4.7 74 
Banker O 39 7.0 62 
Personnel manager | 36 9.5 50 
Production manager... -.- - | 35 10.2 47 
Policeman......... | 34 | 10.4 47 
Carpenter ga) n3 43 
PBST CP e succes aca ew -ore sonic 32 | 12.6 37 
President of manufacturing concern...) 31 12.1 39 
City school superintendent 30 | 14.1 32 
a 29 ac 20. on 9 eee 00 29 | 13.5 34 
Mathematician. . 28 16.1 27 
Musician... e e 26 15.2 27 
Taret. ea iia ain 26 16.6 26 
Lawyer.. 26 16.3 26 
AdVErtISET. oreen a e 25 16.3 27 
Life insurance salesman... - 5 15.7 27 
Ministers. oo. cccneener 20 21.3 14 
Physician... eeaeee sA 19 22.9 11 
Artist... 11 28.4 6 


* From Strong, E. K., Jr. Vocational Interests of Men and Women. Stanford University, 


Calif.. ¢ 
alif.: Stanford University Press, 1943. 


mae of these same occupational groups on the accountant scale. 
n each of these tables, column 3 shows the percentage of overlapping 
etween the distribution of scores for artists or accountants, as the 


Case may be, and those for the other occupational groups. As used 


Y Strong, percentage of overlapping means “the percentage of 


Scores made by one group which could be matched with scores in 


‘the other group.” In both tables there can be seen a steady reduction 


In Percentage overlapping as one proceeds from the top to bottom 


r 
OW of the table. 
e have no way of knowing, apart from the data secured by 


means of the Strong Vocational Interest Test, what the “true” 
Overlap between the scores for artists and accountants may be. But 
€ fact that there can be as little overlap as 7 per cent (between 
artists and accountants) or 4 per cent (between nurses and life 
qeatance saleswomen) would seem to validate the assumption that 

© members of different occupational groups can be differentiated 


64 Personality Measurement 


from one another. The Strong Vocational Interest Test can be 
considered a valid test in that it reveals these differences. 

Not all scales show so little overlapping as the artist and ac- 
countant scales or as the nurse and life insurance saleswoman scales. 
The musician and artist scales, for example, overlap to the extent of 
40 per cent, and the accountant and banker scales overlap to the 
extent of 62 per cent. It seems reasonable that musicians and artists 
should overlap more than accountants and artists. And it seems 
reasonable that accountants and bankers should overlap to a greater 
extent than accountants and artists. But whether the true overlaps 
are 40 and 62 per cent or whether they should be reversed, there is 
no way of telling. 

Group Scales. Data relative to the validity of group scales are pre- 
sented in Table 22. This table shows the percentage of each occupa- 


TABLE 22. The Validity of Group Scales* 


Scale % Scale % 
Group scale I: | Group scale VIII: 
Artist.. 92 | Accountant.. 70 
Architect.. J 72 Office worker... , 65 
Psychologist.......... 74 Purchasing agent, š J 65 
DEIS Be aiin ace 49 Banker,.........., 4 J 74 
| teseane n 62 | Cr i ne e moasa aa ana 68 
EE i rA AR 70 | Group scale I 
Group scale I: Salessmanaper ios ss cs ceo ga ae ge vo 66 
Binge A E eanvaegy 62 Life insurance salesman...,....... 81 
CRET eaa ugnnriniay six con ae 78 | Realtor........... za 67 
Mathematician. . 72 Average. 71 
PIRIS <5: iane eaen i 27 | Group scale X: 
A T AREPA BENE 77 Advertiser 69 
Group scale V; Lawyer... 56 
Y.M.C.A. physical director... ..... 75 | Journalist 83 
Personnel manager... . 48 Average.. | 69 
Y.M.C.A. secretary. 87 | een 
Social science teacher [é 
City school superintendent. 66 
Minister | 92 
Average Ea l 
I 
* From Strong, F. K., Jr. Vocational Interests of Men and Women. Stanford University» 
Calif.: Stanford University Press, 1943, 


tional group that gets a score of A on the group scale. We know from 
our previous discussion that approximately 70 per cent of each 
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criterion group of subjects secure a letter rating of A, that is, a 
standard score of +5 or more, on their own occupational scale. 
Therefore, we can determine the validity of a group scale for any 
occupational group by noting what percentage of its members se- 
cures a rating of A on the group scale designed to represent them. 
With this thought in mind we can see that Group Scale I possesses 
its greatest validity for architects and psychologists and its least 
validity for artists and dentists. Similarly, Group Scale V is seen to 
be more valid for Y.M.C.A. physical directors and social-science 
teachers than it is for ministers and personnel managers. We must 
conclude that the group scales vary in validity, depending upon the 
Particular occupations involved. 

Nonoccupational Scales. In this instance we can almost say that 


the scales are valid by definition. Strong defined carefully that which 


Was to be measured: interest maturity, masculinity-femininity, and 


Occupational level, and described in detail how he selected the 
Criterion groups involved. Since we have gone over this material, it 
Seems unnecessary to review it here. The only additional thought 
of value in connection with our current discussion on validity is that 
the scales can be considered valid when the scores are interpreted 
with due regard for the manner in which the criterion groups were 
Selected and the scales constructed. Whenever in doubt as to the 
true meaning of the scores on one of these special scales, the reader 
Should refer to our earlier discussion or, better yet, to Strong’s own 
explanation in Chaps. 10 to 12 of his book Vocational Interests of 
Men and Women. 

Tt will not be worth our while to review the evidence for the re- 
Maining types of scales, for it 1s scattered and incomplete. Of neces- 
sity, the value of this evidence varies with the specific scales, with 


the investigator, with the purpose to be served, and so forth. Also, 
ur primary purpose in this chapter is to understand the methods 
Which Strone himself has used in his research on the Vocational 
nterest Test, and we have covered most of these in our previous 


discussion. 


SCORE INTERPRETATION 


Vocational Interest 


„Tn concluding this chapter on the Strong 
neaning to be at- 


est, it will be well for us to consider again the n 
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tached to the various scores that can be secured. In most test-score 
interpretation it is customary for us to consider extreme test-score 
deviates as the atypical members of some defined group. Thus we 
commonly think of an I.Q. of 200 or of 50 as representing the near 
extremes of the distribution of intelligence of a normal population 
and as being far removed from the typical or average member having 
an I.Q. of 100. In the case of the occupational scores on the Strong 
Vocational Interest Test, this customary interpretation cannot be 
applied. In this instance, the higher the score on an occupational 
scale, the more typically like that group do we consider the indi- 
vidual in question. This phenomenon comes about because of the 
method by which a Strong. occupational scale is constructed. To 
construct the scale for artist, Strong contrasted the responses given 
by 241 artists with those of 4,746 men-in-general. Upon the basis 
of the differences obtained, a scoring scale was constructed. And 
this scoring scale was constructed upon the basis of the hypothesis 
that items showing large differences should be weighted more 
heavily than items showing small differences. Therefore, when 4 
subject answers an item in the same way as artists, he receives 4 
high score on this item. If he answers an item in the same way as 
men-in-general, he receives a low score on this item. Therefore, if 4 
subject answers many items in the same way as artists, he receives 
a high score on the artist scale. This high score indicates that he has 
the same interests as artists, and if this is the case, it means he 1S 
like the typical artist. On the other hand, if the subject answers more 
items like men-in-general, he receives a low score as artist. The lower 
this score, the more he is like men-in-general. Thus an extremely high 


score indicates that our subject is like the typical artist, while an 


extremely low score indicates that our subject is like the typical 
man-in-general. 


— 


One of these scores is known, not too much a 


3 


INTERESTS: A RATIONAL APPROACH 


We learned in the last chapter that the development of ascale on the 
Strong Vocational Interest Test can be a time-consuming and pain- 
staking task. It has taken Dr. Strong over twenty-five years to 
develop his 60 plus scales, and these cover only a few of the many 
thousands of existing occupations. To extend Dr. Strong’s pro- 
cedures to cover all occupations would seem to constitute an almost 
endless and, perhaps, a thankless task. Yet if guidance by means of 
tests is a legitimate endeavor, why should we deny this service to 
those who may wish to consider entering any of the many occupa- 
tions for which the Strong Vocational Interest Test cannot be scored? 

This was one of the considerations, among others, that led Dr. G. 
Frederic Kuder to a new approach and to the ultimate development 
of his Preference Records. In these Preference Records, as we shall 
see in detail later, an attempt is made to provide scores on a number 
of “basic” preferences having, it is supposed, differential degrees of 
Significance for a variety of occupations. When the scores in these 
areas are obtained, the subject or his counselor, or both together, are 
Supposed to be able to use them in deciding upon occupations suit- 
able for serious consideration. The timesaving feature, in contrast 
to Strong’s approach, lies in the supposition that the preferences 
Measured by the Kuder Preference Records are relatively independ- 
ent and that, in differently weighted combinations, they can be 
applied to almost any occupation. 

In the Strong Vocational Interest Test, it is possible for tworscales 
to be highly interrelated. For example, scores on the physicist scale 
correlate .91 with those on the mathematician scale. Therefore, if 
dditional information 
can be gained from a knowledge of the other. This does not hold, of 
Course, for all scales. Scores for certified public accountants and 
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artists, for example, correlate not at all (r = .00). Therefore, knowl- 
edge of both of these scores tells us much more about a person s 
interests than does knowledge of either one alone. Our chief point in 
bringing this up is to demonstrate that in Strong’s approach to 
interest measurement there can be no forehand knowledge (there 
can be speculation, of course) as to how any proposed scale will 
correlate with scales previously developed. If a new scale should 
correlate highly with one already developed, as indeed one for 
physicist did correlate with one for chemist (7 = .93), there is too 
little gain in knowledge to compensate for the time and effort 1n- 
volved in its original development and in its later scoring. Therefore, 
if we could assure ourselves ahead of time that each new scale to be 
developed would not correlate with any preexisting scale, we could 
feel that we would not need to run the risk of duplicating results 
already obtainable. 

This gives us the background for Kuder’s approach. He wanted 
scales which would not correlate with each other. Therefore he 
developed his scales by methods which would assure their maximal 
independence. It is for this reason that we call Kuder’s approach a 
rational one. He started out, not with Strong’s purpose of establish- 
ing empirical differences among occupational groups, but with the 
avowed intention of constructing uncorrelated scales. This is clearly 
a rational as opposed to an empirical objective. Kuder developed his 
scales without reference to what they might actually measure in 


terms of vocational significance. This problem was to be attacked 
later. 


THE PREFERENCE RECORD—VOCATIONAL 


„The Kuder Preference Record—Vocational (1946 revision) con- 
sists of 160 triadic groups of items. In each of these triads the subject 
is asked to indicate which of the three activities he likes best and to 
indicate which of the three activities he likes least. These instructions 
make it possible for the subject to show which of the three activities 
is preferred to two of the other activities, which of the three ac- 
tivities is preferred to only one of the other activities and which © 
the three activities is not preferred to either one or both of the other 
two activities. 


_ An example of the type of item in the Preference Record is as 
follows: 
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a. Visit an art gallery 

b. Browse in a library 

c. Visit a museum 

A subject can check any one of the three activities, 2, 4, or ¢, as 
most preferred and any one of the three activities as least preferred. 
This leads to six possible preference orders of the three activities. 
These are a, by 65.4, €, b; b, a, 65) GOS b: or c, b,a. In other words, 
there are two ways in which activity a can be indicated as first 
choice; there are two ways in which activity 2 can be indicated as 
first choice; and there are two ways in which activity ¢ can be indi- 
cated as first choice. Accompanying each of these are two correlative 
ways in which an activity can be placed in second position and two 
correlative ways in which an activity can be placed in third position. 
Kuder assigns a weight of 2 for an activity preferred to two other 
activities, a weight of 1 for an activity preferred to only one other 
activity, and a weight of 0 to an activity preferred to neither of the 
other two activities in the triad. The total score on a scale is ob- 
tained by a simple addition of these item weights. 

The Kuder Preference Record—V ocational yields 10 different 
scores. These indicate preferences for activities described as me- 
scientific, persuasive, artistic, literary, 
al, and outdoor. The score on each scale 
is supposed to indicate the degree of a subject’s preference for the 
type of activity involved in the designated area. Raw scores are 
interpreted in terms of their percentile ranks, separate norms being 
available for high-school students and for adults. Kuder suggests 
that an individual should seriously consider entering any occupation 
involving the type of activity indicated by a scale on which he 
receives a percentile rank of 75 or over and that he should seriously 
Consider staying out of any occupation involving the activity indi- 
— by a scale on which he receives a percentile rank of 25 or 
ess. 


chanical, computational, 
musical, social service, cleric 


To aid a person in making an appropriate choice of occupation 

Kuder gives a list of occupations which are suitable for consideration 
. z 

by those with specified scores On the Preference Record. Kuder 


States that a few of the suggestions made are based on research data, 


but the vast majority are based on nothing more than Kuder’s 
Judgment as to what occupations should be considered. We have 
nothing against Kuder’s judgment, but we must point out that 
Judgment was involved. This is a definite weakness of the guidance 
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available on the strength of the scores on the Kuder Preference 
Record in contrast with that available upon the basis of the scores 
provided by the Strong Vocational Interest Test. In the latter, all 
scales are based upon empirically demonstrated differences among 
occupational groups and not upon anyone’s subjective judgment 
as to whether two occupational groups may or may not be different. 

Development of the Scales. Kuder’s first step in the develop- 
ment of his Preference Record—Vocational was to prepare a list of 
200 activities. These activities were those which, on an a priori 
basis, appeared to be useful indicators of interest preference. Kuder 
arranged these items into 40 groups of five activities each and took 
care to have as many different types of activity as possible repre- 
sented in each group. He gave this form of the test to 500 Ohio State 
University students (1934-35) and asked them to rank, in order of 
preference, the activities in each of the 40 groups. 

In this preliminary edition a number of activities appeared to be 
classifiable as mechanical in nature, and another group appeared to 
be classifiable as literary in nature. Using these items, and on an 
a priori basis, item weights were assigned to indicate a liking for 
mechanical activity in preference to other types of activity and a 
liking for literary activity in preference to other types of activity- 
The items in the literary scale were found to produce total scores 
possessing a split-half reliability of .85, so it appeared that they 
measured something (presumably, preference for literary activity) 
in a reasonably consistent manner. This fact led Kuder to choose 
this scale as the anchoring post for the development of a second scale. 

The next step in this further development was that of computing 
the correlations between the responses of each of the original 200 
items and the total score on the literary scale. Items found to have 
low correlations with the literary preference score were selected for 
further study. Examination of the content of these items revealed 
a sizable number which Kuder thought to be indicative of a prefer- 
ence for experimental (or, later, scientific) activity. Therefore, 4 
core of these items was selected and weighted to constitute a new 
scale. When scored, this scale produced a split-half reliability of .65- 


The next thing which Kuder did was to correlate all 200 items with 
this experimental scale, and all items 


significantly with it were incorporated therei 
to the experimental scale correlated in 


Interests: A Rational Approach 71 


with the literary scale, it was balanced by another item which cor- ° 
related in a significant negative direction. In this way, the correla- 
tion between the literary and experimental scales was kept near zero. 

Kuder now examined the items not included in either the literary 
or experimental scales and selected, from these remaining items, 
those which seemed to indicate preference for artistic activity. Item 
correlations with this scale were determined, and items found to 
correlate with it were added thereto. In adding these items Kuder 
attempted to balance as nearly as possible their correlations with 
the literary and experimental scales. His goal was to use in the 
artistic scale only those items which correlated zero with both the 
literary and experimental scales, but since not every item met this 
criterion, items were balanced against each other so that the total 
score on the artistic scale would correlate as little as possible with 
the total scores on the literary and experimental scales. 

The fourth scale Kuder developed was designed to measure s social 
Prestige.” The procedures followed were the same as those described 
in connection with the previous scales, but, in this case, the problem 
Was much more complex. Items had to show no correlation with the 
literary, experimental, and artistic scales. Upon the completion of 
the prestige scale the remaining items did not seem to lend them- 
Selves to meaningful classification, so, from this initial set of items, 
no further scales were constructed. 


Kuder now collected additional items and administered these, 
items, to new groups of students. The 


h each of the existing scales was ascer- 
tained, and many of the items, because of these correlations, were 
found suitable for inclusion in the previously developed scales. 

here remained many items, however, which could not be added to 
Previously existing scales, so these were examined for content with a 
View toward using them as the building blocks for additional scales. 

Kuder tried to develop scales for athletics, religion, finance, 
Politics, and annoyances. For various reasons, however, none of these 
Scales proved satisfactory. One of the stumbling blocks encountered 
in the attempt to develop these additional scales was the fact that 
no scale beyond the first three—literary, experimental, and artistic— 
Could be developed without showing a marked correlation with the 
Social prestige scale. Therefore, Kuder dropped the prestige scale 
and divided many of its items between two other scales now pro- 


together with all previous 
Correlation of each item wit 


7. Personality Measurement 


posed. This made it possible for Kuder to continue until seven scales 
had been constructed. These seven scales are literary, experimental, 
artistic, computational, persuasive, musical, and social service. 

Kuder now published the test as Form A and, in this form, it was 
used extensively by personnel and guidance workers. Later, as a 
résult of various criticisms, suggestions, and further study, Kuder 
felt it would be desirable to add scales for mechanical and clerical 
activities. But Kuder developed these scales only in terms of a 
criterion of internal consistency and did not concern himself with 
how the items in them correlated with the total scores on the other 
seven scales. 

Intercorrelations among Scales. The seven scales originally pre- 
pared have, as they should, fairly low intercorrelations. The highest 
correlation Kuder reports is that between the persuasive and scien- 
tific scales. This averages —.38 for six groups of subjects. Most of 
the other intercorrelations are very near zero, as the data presented 
in Table 23 demonstrate. 


T 


ABLE 23. Intercorrelations among the Scales on the Kuder Preference Record — 
Vocational* 


Com- |... n | | + 

al Scien- | Persua-| Artis- | Liter- | Musi- r Cler- 

Scale püta- | `a a || Social | «ay 

| ional tific sive | tic ary | cal ica 
| | a! 
Mechanical.........., — 038) .352 | —.219) 145] —.384| — 28g) — 205 | —.225 
Computational. ......, | -198 | —.169) — 252) —.095| — 162] —.116| -464 
Scientific | liz 377 | -= .110| —.182| —.293| — 075| —.227 
| | | =-157| 137| —.001 | —.023| 104 
| | | —.180| 036| —.278| —.272 
| | 093| —.131| — -143 
| | —.106| —.027 
| | | | —.222 

| 


* Adapted from Kuder, G. F. Revised 


Manual for the Kuder Preference Record. C icago? 
Science Research Associates, 1946, u i a hie 


Validity and Reliability. 


The reliabiliti Suder are 
given in Table 24, They ra les reported by Kuder 


i nge from .80 to .98 and have a median 
value of .91. The groups upon which these values are based vary in 


size from 41 to 300 and include eighth d hi nd 
college students, and adults. i ene 


We may conclude, then, 


. ; „that Kuder has been reasonably 
successful in reaching his mai 


n objective: that of getting reliable 
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measures of nearly independent variables. Now if these scales cover 
Interests in an adequate fashion, they should make for great economy 
in giving a person some idea of the scope and direction of his inter- 
ests. It must be clearly understood, however, that on any Kuder 
scale, a score represents the extent to which a certain group of 
activities (the nature of which was subjectively determined) is 
preferred to several other types of activity. And whether any given 
Score is to be considered high or low can be determined only in 


Tare 24. Reliadilities of the Scales on the Kuder Preference Record—Vocational* 


on ] 
Num- | Me. | Com Scien- pen | Artis- Liter- at So- | Cler- 
Group | chan- | puta- |". | sua- A | si- i P 
ber |. x ific $ tic ary | cial | ical 
| ical |tional | sive cal 
— SENE ere SS es et eas i eae 
| 

Graduate students. ...| 41| .97 | 98| .95| .97| .96| .95| .95 | .93 | .98 
College students......| 166| -94| 90| -93| .93 | oO | 90 | 90} .91 | .89 
College students. | 101| .91) .88 | 88 | 94| .90| .92] 85| .90| .86 
College students. 50| .85| .87| -91| 8t] -95 | 84| .96 | .92 | .95 
High school seniors. ..| 125| .93 | -90| -90| -82| .91 | yt | 90) 87 | 87 
High school seniors. ..| 125 | .89 s3 | .89 | .80 | 92| .91] .91] 93] .90 
Sth Grade students -| 100 | .96 | 86 92 | 84 92 86 | .93 | .91 | .89 
Men in occupations...) 300| 95 91 89 | 89| .90} .93 | 94 | .93 | .88 


* From Kuder, G. F. Revised Manual for the Kuder Preference Record. Chicago: Science 
Research Associates, 1946. 


terms of its percentile position with reference to the scores made by 


the high-school students or adults in the basic normative groups. A 


Score is not high or low, as in the Strong Vocational Interest Test, 


with reference to any specific occupation. 
But if the scores on the Kuder Preference Record are to be used in 


8uidance work, and they are, we must learn whether or not they are 


Useful in distinguishing occupational groups from each other. 
Kuder’s data on this point are not so extensive as those provided by 
trong, but he does show, in many instances, that the members of 
different occupations do in fact secure distinguishing Preference 
Record scores. Tables 25 and 26 show the mean Preference Record— 

Ocational scores secured by the members of various occupational 
Stoups. Kuder presents data for other occupations also, but we have 
included in these tables only those means based upon 50 or more 
Cases, 

Kuder presents data for 15 occ 
xcept for the smaller and less care 


upational groups which appear 
fully selected populations) to be 
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Tase 25. Mean Preference Record Scores for Men* 


Num- Me- ‘| Com- Scien- Kex: Artis- | Liter- Mu: So- | Cler- 
Group ber (Onan pata! uga [SEF | de ary S- | cial | ical 
ical | tional sive cal 
Base group.........-+ 2,667 | 79 35 64 74| 46 48 17| 74| 52 


High school boys -| 1,858 | 78 35 68 67| 48 47 18| 62) 54 
Accountants........ 117 | 68 50 6l 74| 41 54 18| 69| 62 
AOE eais ori 50| 48 27 52 82 | 42 76 22 73 48 
Mechanical engineers..| 60] 94 37 74 74| 48 45 16| 65| 44 


Drug store managers..| 130| 72 36 72 84| 44 44 16] 71| 50 
Secondary school 


T S 120| 65 36 63 69| 41 53 Hi E 51 
Meteorologists........ 185 | 84 36 77 63 | 45 54 17| 66| 46 
Personnel managers...) 67 | 67 33 60 85 | 40 53 17] 83 | 49 
Physical education 

instructors......... 60| 67 | 30 | 64 | 6| 40 | 49 | 17| 96} 40 
Weather observers....| 99| 82 | 40 | 77 | 59| 45 52 18| 66| 52 
Retail managers...... 82] 72 | 36 | 60 | 85| 43 | 45 | 1| 74] 55 
Sales managers....... 89 | 73 31 60 95 | 42 51 1i 71) 4 
Securities salesmen....| 59 | 66 | 33 | 55 |103| 43 | 47 | 19| 80] 50 
Salesmen toconsumers| 130] 73 | 32 | 62 | 95| 42 | 47 | 16| 72| 50 
Manufacturing fore- 

Midhiswrct ame sc nsce 69| 93 | 38 | 71 | 68] 48 | 42 | 14] 72] 51 
Steel manufacturing 

foremen........... 54] 94 | 38 | 70 | 69| s0 | 41 | 13] 70] 51 


*From Kuder, G. F. Revised Manual for the Kuder Preference Record. Chicago: Science 
Research Associates, 1946. 


Taste 26. Mean Preference Record Scores for Women* 


Me- | Com-|... Per- i - 
Group aa chan- | puta- <3 sua- Artis Titers a 2 ge 
ical | tional | "°° sive | "° ay | cal cid 2m 
Base group........... 1,429 | 53 32 55 62| 53 53 
a . ; 21| 81| 62 
High school girls......| 2,005 | 50 29 53 66 | 52 49 24| 79| 63 
High school English ; 
teachers, 455.0505 69| 42 | 28 | 46 | se] s55 56 
High school home eco- n SE 
nomics teachers..... 94| 57 30 56 58 3 
Occupational thera- i i ena 
e 70| 73 | 26) 55 | sr} 6g | a 44 
Trained nurses....... 183 | 54 29 60 53 | 54 o A Ms 50 
Bookkeepers. . . al SOY BS 38 53 62| 51 53 23 | 82| 65 
| Office clerks.....2.... 78| 54 | 35 | 54 | 62] s | 4g | 2| a | 09 
Stenographer-typists..| 168] 51 | 30 | 55 | ss| s2 | so | 93| 78] 69 


me m a a 
From Kuder, G. F. Revised Manual for the Kuder P; i i 
B N reference Record. Chicago: Science 
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similar to the standardization groups used by Strong. It should be . 
instructive and profitable for us to compare the classifications of 
these 15 occupational groups as given by the Kuder and Strong 
inventories. Table 27 shows the 15 occupations, their mean scores on 


TABLE 27. Comparison of Scores on the Kuder Preference Record and the Strong 
Vocational Interest Test* 


Kuder Strong 
Occupation Scientific Persuasive Chemist pone 8 
Mean | Rank | Mean | Rank| r | Rank| r Rank 
-| 
Accountatit os sa pa ainakaan 6l 7 74 8 | —.16 9 —.02} 10 
Author. .. s2 | 14 | 82 | 4| -06 6 i 7 
79 2 64 13 7) 2 —.69| 14 
86 1 63 Ié eee 1 — .84) 15 
62 6 74 8 | —.42) 12 36 $ 
58 12 79 5 | —.31| 10.5 47| 3 
57 13 91 2 | —.84 15 | ..... 1 
77 3 57 15 56) 3 — .64 13 
60 9 66 12 | —.10} 7 09} 8.5 
51 15 71 10 a S —.23| 11 
60 9 85 3 | —.31| 10.5 31] 6 
67 4 78 6 40) 4 —.38| 12 
«| Q Ş 95 1 | =-74 14 || .82) 2 
eect ` 59 11 74 8 | —.59| 13 44| 4 
Ysical education instructor... .| 64 5 68 11 =11] 8 .09} 8.5 


Vocational Interests of Men and Women. Stanford Uni- 
ress, 1943; and from Kuder, G. F. Revised Manual for 
Research Associates, 1946. 


N Adapted from Strong, E. K., Jr. 
th rsity, Calif.: Stanford University P. 

€ Kuder Preference Record. Chicago: Science 
and scientific scales, and their correlations 
fe insurance scales on Strong’s test. A careful 
lationships: 


the Kuder persuasive 
with the chemist and li A 
Study of this table reveals the following re! 


„1. Ranks based on the Kuder science scale and on the Strong chemist scale are 


a and positively correlated. i 
ie, Ranks based on the Kuder persuasive sca 
€ are highly and positively correlated. 
aid Ranks based on the Kuder science scale an 
4 (eel correlated. i 
hight anks based on the Strong chemist s 
y and negatively correlated. 


le and on the Strong life insurance 
don the persuasive scale are highly 


cale and on the life insurance scale are 
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5. Ranks based on the Kuder science scale and on the Strong life insurance scale 
are highly and negatively correlated. - 

6. Ranks based on the Kuder persuasive scale and on the Strong chemist scale 
are highly and negatively correlated. 


These results show, it must be admitted, a striking resemblance. 
This appears truly remarkable in view of the markedly different 
approaches used by Strong and Kuder in the development of their 
inventories. The similarity of results, in view of these extremely 
divergent approaches, shows the rankings to possess a considerable 
degree of substantial or psychological validity. They demonstrate 
that important practical results can be reached through a theoretical 
or rational approach, as well as through an empirical one. We must 
point out in this connection, however, that results on the Kuder 
Preference Record and on the Strong Vocational Interest Test do 
not agree in all particulars as well as they do in our illustration. 
And when they differ, it would seem the safer course to assume that 
Strong’s data possess the greater validity. 

Short Form. We have up to this point been discussing only one of 
three Kuder Preference Records. It is available in two formats 
(Forms BB and BM) for hand or machine scoring. A shorter form 
consisting of 120 groups of activities rather than of 160 groups of 
activities, as in the longer form, is known as the short form (BI)- 
Kuder developed this form by selecting the “better” groups (triads) 
of items from the longer form. It is to be used, says Kuder, in those 
situations in which time cannot be made available for the longe" 


Taste 28. Reliabilities of the Scales on the Short Form of the Kuder Preference Record— 
Vocational and Their Correlations with the Scales on the Long Form* 


| 
Me- | Com-|.. Per- i y 7 Tler- 
Group chan- | puta- sii sua- Arus Liter- | Musi- Social e 
seal sonal tific | -aive tic | ary cal ica 
-| - — = = a ee 
Intercorrelations: | | 
Men (N = 100)... .... 8| .97| 97| .96| | 3| 97| 97| -98 
Women (N = 100)....| 96| .97| ‘95| ‘oel ‘os | 93| .97| 26) 2 
— |_ zal eo A ae 
Reliabilities: | | 
Men (N = 100)....... | zoe -86 -90 M 


Women (N = 100)...., 88 | 88 


.87 86 


| a 
.90 | 30| 8) .88 -85 
837| .90| 88| .90| -2 


| 
* From Kuder, G. F. Revised Manual for the Kuder Pre erence R, i jence 
3 d. Chicago: Scie 
Research Associates, 1946. á — ae 
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form. The reliabilities of the scales on the short form and their 
correlations with the scores on the long form are given in Table 28. 


THE PREFERENCE RECORD—PERSONAL 


The third Kuder Preference Record is known as the Kuder Prefer- 
ence Record—Personal. It yields scores on five types of activities not 
heretofore discussed: sociable, practical, theoretical, agreeable, and 
dominant. Kuder’s definition of each of these scales is as follows: 


Sociable. This scale measures expressed preferences for personal activities of a 
sociable nature. A preference for taking the lead and being in the center of activities 
Involving people. 

Practical. This scale measures expressed prefere 
practical nature, A preference for dealing with pr: 
affairs rather than interest in imaginary or glamorous activities. 

Theoretical. This scale measures expressed preference for person 
theoretical nature. A preference for thinking, philosophizing, and speculating. 

Agreeable. This scale measures expressed preference for personal activities of an 
agreeable nature. A preference for pleasant and smooth relations which are free 
from conflict. 

Dominant. This scale measures expressed prefer 
dominant nature. A preference for activities invo! 
Power. 


nce for personal activities of a 
actical problems and everyday 


al activities of a 


ence for personal activities of a 
lving the use of authority and 


the construction of the Kuder Prefer- 
much the same as those used in the 
reference Record—Vocational. There- 
d in any great detail. Kuder’s aim 
factors not already covered by the 
which did not intercorrelate 


The principles followed in 
ence Record—Personal were 
development of the Kuder P. 
fore, they need not be reviewe! 
Was to devise scales measuring 
Vocational inventory and, of course, 
among themselves. 

The preliminary form o 
900 groups of activities in triads. 
veloped. Then the item correlations 
those on the vocational inventory, 
cedures we have already described, items were added to the various 
Scales in such a way as to augment reliability and to minimize inter- 
Correlations. New items were then assembled, the scales were further 
augmented, and eventually two of the proposed seven scales were 
dropped. This left the five scales on which the inventory can now be 
Scored. One of the scales has a reliability of .84, two have reliabilities 


f the personal inventory included almost 
Seven tentative scales were de- 
with these scales, as well as with 

were obtained. Following pro- 
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of .85, and two have reliabilities of .86. These are all lower than for 
the vocational inventory. The intercorrelations among the scales are 


shown in Table 29. 


Taste 29. Intercorrelations among the Scales on the Kuder Preference Record— 


Personal* 
Scale Sociable Practical | Theoretical} Agrecable 
Practical nssisc caste —.26 
Theoretical. al? —.03 
Agreeable... — .06 21 24 
Dominant 332 —.22 .25 —.21 


* From Kuder, G. F. Examiner Manual for the Kuder Preference Record—Personal. Chicago: 
Science Research Associates, 1948. 


RATIONAL VS. EMPIRICAL APPROACH 


Throughout this chapter we have made many references tO 
Kuder’s approach as a rational one and to Strong’s approach as an 
empirical one. In concluding this chapter, we should like to bring 
this difference into even sharper focus, because the differences in the 
procedures followed by Strong and Kuder have not received the 
attention they deserve. Too many personnel psychologists, voca- 
tional counselors, and guidance workers assume that an interest 
test is an interest test is an interest test, and let it go at that. We 
cannot urge too strongly that a thorough understanding of tbe 
different procedures followed by Strong and Kuder is fundamental 
to giving proper guidance and counsel in the field of interest 
measurement. 

To repeat, Strong’s approach can be called empirical. Not that 
theory is not involved, but Strong started out to secure data which 
would show that different occupational groups could be differentiated 
in terms of the interests of successful men in each of several occupa 
tions. The development of each scale followed an empirical finding 
that one occupational group could be differentiated from another 
or from a men-in-general group. If the empirical finding showed that 


a given occupational group could not be differentiated from any 
other occupational group or from a men-in-general group, no scale: 
was developed. Obviously, it would serve no purpose ? 


Kuder’s approach can be called rational. Not that empiricism is 
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not involved, but Kuder started out to secure scales which would 
have no correlation with each other. He wanted to differentiate 
people from one another but upon a basis that one method of classi- 
fication would yield results entirely independent of those yielded by 
a second method of classification. Kuder felt that the empirical 
application of his scales should follow their rational development. 
Strong felt that the scales should be developed originally in an 
empirical manner. 

Kuder felt that the rational development of scales would lead to 
considerable economy in the description of the interests a person 
might possess. Thus if several independent variables can be demon- 
strated to offer a fairly complete picture of the interests which a 
person is likely to possess, it does not seem unreasonable to suppose 
that such a scale should be found useful whenever a measure of 
interests is desired. Theoretically, there is some limit to the number 
of scales which Kuder might find it useful to develop. This number is 
reached as soon as it is discovered that an additional scale adds no 
Information independent of the preceding scales. 

In Strong’s approach there is no limit to the number of scales 
which might be developed. ‘Theoretically, there could be a scale for 
cach occupational group now in existence. Thus the number of 
Scales might be twenty or thirty thousand. This, of course, is carrying 
things to an extreme. Certain occupations are so much alike that 
It is pointless to have a separate scale for each of them. There is no 
definite and clear-cut way of finding this out, however, until separate 
Scales for the two groups have been developed. Thus Strong’s ap- 
Proach leaves open the possibility that much labor may be expended 
in developing a new scale only to find that it already duplicates 
closely a previously existing scale. The only preventive to this is, of 
Course, the investigator’s own good judgment, and, unfortunately, 
we know this cannot always be relied upon. i ape 

Another way in which the Strong and Kuder inventories differ 
from each other is in the groups for which they were designed. 
Strong’s inventory was designed primarily to serve the needs of the 
college student looking toward a professional noe It ti designed 
to help su a the light of his interests, focus is attention 
‘on aes oe me a a of closely related occupations 
Which would be appropriate for him to consider. Other things such 
as ability being equal, Strong theorized that a person would find 
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himself happier and more successful in an occupation where he 
would find men with interests similar to his own. He would not 
definitely be a failure, but presumably less happy and less successful 
it he were to enter an occupation which attracted, for the most part, 
men with interests largely different from his own. 

Kuder’s Preference Records are designed, primarily, for the high- 
school student, for the younger student, and for the student who 1s 
not yet ready to pinpoint his efforts in preparation for a specific 
occupation. It is also designed with the thought that it should be 
useful to a broader class of students than those who are slated only 
for a job in one of the professions. Kuder’s Preference Records are 
geared to lead the student to a consideration of a broad or general 
field, such as the physical sciences, without concern as to whether the 
ultimate career is to be in physics, chemistry, or engineering. Or 1t 
will lead him to consider literary activity without regard as to 
whether the ultimate career is to be in teaching English, in being @ 
playwright, or in becoming poet laureate. 

As a general field usually has to be selected much earlier than 4 
final field of specialization, it would seem that an appropriate joint 
use of Kuder’s and Strong’s interest inventories, at younger and 


older ages, might well be considered as part of any complete voca- 
tional guidance program. 


4 


ATTITUDES: AN A PRIORI APPROACH 


We learned in the last two chapters that the concept /#/erest covers 
such things as our likes and dislikes, our preferences, and our aver- 
sions. In contrast, the concept attitude, with which we are to deal in 
this and in the next chapter, covers our beliefs. We believe something 
1s right or that something is wrong. We favor this and object to that. 
We accept this position and reject that position. This believing or 
disbelieving, this favoring or not favoring, this accepting or reject- 
Ing, constitute expressions of attitude. 

To illustrate concretely the difference between an interest and an 
attitude, let us consider the statement “I like bananas.” Are we to 
consider this statement expressive of an attitude toward bananas 
Or expressive of an interest in bananas? Ordinarily, we would classify 
this statement as an expression of interest, not as an expression of 
attitude. We do this because there is implied in this statement no 
acceptance or denial of any belief about bananas. The statement 

I like bananas” implies nothing at all as to whether I think it is a 
good or a bad thing for me to do so. The statement “Bananas are 
good for children” we would classify as an expression of attitude. It 
clearly implies a certain belief about bananas, namely, that they are 
Sood for children. 


Our interest in bananas and our attitude toward bananas are two 


independent concepts. We can believe that bananas are good for 
children (attitude), but still not like them (interest). We can believe 
that bananas are nat good for children (attitude), but we can, never- 
theless, like them (interest). We can believe that bananas are good 
or children (attitude), and can also like them (interest). And last 
of all, we can believe that bananas are bad for children (attitude), 
and we can also dislike them (interest). We may summarize our dis- 
cussion by saying that an interest is an expression of feeling, whereas 
an attitude is an expression of belief. 
81 
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In the field of attitudes, as in the field of interests, there gia 
principal measuring techniques to be discussed. These nal rp 
stone’s method of equal-appearing intervals and Likert’s method 
summated ratings. We have selected these methods for eera 
not only because they are the two principal methods in the field, F 
also because they illustrate two diametrically opposite approache 

central problem involved. i 
aos sce pombe is the scaling of test items. In case ent! 
method, the method of equal-appearing intervals, the scaling of ye 
items takes place before the collection of attitude data. For a 
reason we call it an a priori approach. In Likert’s method, the ma 
of summated ratings, the scaling of test items takes place after the 
collection of attitude data. It is, in fact, dependent upon them. For 
this reason we call it an a posteriori approach. We shall discuss m 
latter approach in Chap. 5. The remainder of this chapter will k 
given over to a discussion of the method of equal-appearing intervals: 

The first complete description of the method of sjiabappranii 
intervals as applied to the measurement of attitudes appeared 1n i 
monograph published in 1929. This monograph, “The Measuremen 
of Attitude,” represented the joint efforts of Professors L. L. Thur- 
stone and E. J. Chave of the University of Chicago. In this mono- 


Taste 30. Equal-appearing-interval Attitude Scales 
Edited by Professor L. L. Thurstone 


Attitude toward the Bible 

Attitude toward birth control 
Attitude toward capital punishment 
Attitude toward censorship ; 
Attitude toward the Chinese 


Attitude toward God (influence on conduct) 
Attitude toward honesty in public office 
Attitude toward immigration 

Attitude toward the law 


Attitude toward the League of Nations 

Attitude toward the church Attitude toward the Monroe Doctrine 
Attitude toward communism Attitude toward the Negro 
Attitude toward the Constitution Attitude toward patriotism 
Attitude toward divorce Attitude toward preparedness 
Attitude toward the economic Position of Attitude toward prohibition 

women Attitude toward public office 
Attitude toward evolution Attitude toward public ownership n 
Attitude toward foreign missions Attitude toward the social position of wom? 
Attitude toward free trade Attitude toward Sunday observance {s 
Attitude toward freedom of speech Attitude toward the treatment of crimin@ 
Attitude toward German war guilt Attitude toward unions 
Attitude toward the Germans 


Attitude toward war 
Attitude toward God (beliefin the reality of) 


ee LS ee 
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graph, Thurstone and Chave illustrate the method of equal-appear- 
ing intervals as they applied it in developing a scale to measure 
attitude toward the church. Following the publication of this 
monograph, Thurstone or his students, or both together, developed 
the scales listed in Table 30. 

Many investigators have followed Thurstone’s lead and have 
developed their own equal-appearing-interval attitude scales. Not 
all these scales have been published, however, so it is difficult to 
estimate their total number. But they certainly must exceed 300. 


DEVELOPMENTAL STEPS 


The basic premises in the method of equal-appearing intervals are 
that a series of statements can be made to serve as the markers on a 
yardstick for the measurement of attitudes; that each of these 
Statements will represent a specified degree of acceptance or rejection 
of a belief; and that these specified degrees of acceptance or rejection 
will be equally spaced throughout the entire range of the attitude 
continuum. The theory is that, if a person will indicate which of the 
Statements he will accept and which he will reject, we can locate 
him at a definite position on the attitude continuum. Our problems 
in scale construction are to select an appropriate series of statements 
and to determine what positions on the attitude continuum each 
of our statements represents. These problems are solved by Thur- 


Stone and Chave in seven major steps- These steps are: 


- The collection of a preliminary list of statements 


- The evaluation of these statements 

. The determination of scale values 

- The selection of a final list of statements 

- The elimination of ambiguous statements 
. The elimination of irrelevant statements 

« The collection of normative data 


NDAU PWN 


phs we shall add the details which will 
ved by each of these steps, and we 
Shall discuss some of the problems which each of these steps involves. 
_ Collection of Statements. Statements can be made up by the 
Investigator, they can be suggested by colleagues, they can be 
clipped from magazines and newspapers, and so forth. In all cases, 


» In the following paragra 
illustrate the purpose to be ser 
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they should represent the entire attitude continuum: from complete 
acceptance of a belief to its complete rejection. 

We cannot say in advance how many statements should be col- 
lected, but 200 or 300 will not be too many. Thurstone and Chave, in 
their study, however, used only 130. At the other extreme, lerguson 
Gn a study we shall describe in Chap. 13) reports using more than 
600. The number of statements will vary with the needs to be met 
and with the insight of the investigator. 

After statements have been collected, they must be edited. Many 
statements, upon close scrutiny, will not seem so pertinent as they 
once did. And, in the hustle and bustle of collecting them, careless 
phraseology may have been employed. Editing will usually reveal a 
number of double-barreled and ambiguous statements. Duplication 
of content will be found, and many of the statements will be seen 
to be more effective if reworded. 

Thurstone and Chave present a number of useful rules to follow 
in this editing process, but a more complete set has been presented by 
one of Thurstone’s students, C. K. A. Wang. Close adherence to 
these rules will help materially in the preparation of significant and 
adequate sets of stimulus statements. The more important of Wang’s 
rules are as follows: 


1. Each statement must be debatable. That is, it must reflect opinion, not fact. 
2, Each statement should be relevant to the attitude variable under considera- 


3. Each statement should be subject to just one interpretation, 
4. Each statement should be simple, not compound, 
5. Each statement should be short. 
6. Each statement should be complete in denoting 
specific issue. 
7. Each statement should contain one complete thought. 
8. Each statement should be clear-cut and direct. 
9. Each statement should be st 
10. Each statement should cont: 


a definite attitude toward a 


a sufficient number of state- 
d, we are faced with the task 


e what position on the attitude 
continuum each of the statements will represent. We do this by 


asking judges to sort the statements into various categories, 
Directions to Judges. Thurstone and Chave mimeographed their 
Statements on small slips of paper. Each statement was put on a 


e 
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separate slip. Then these slips were handed to a number of judges 
who were instructed as follows: 


1. The 130 slips contain statements regarding the value of the church. These 
have been made by various persons, students, and others. 

2. As a first step in the making of a scale that may be used in a test of opinions 
relating to the church and religion we want a number of persons to sort these slips 
into eleven piles. 

3. You are given eleven slips with letters on them: A, B, C, D, E, F, G, H, I, J, 
K. Please arrange these before you in regular order. On slip A put those statements 
which you believe express the highest appreciation of the value of the church. On 
slip K put those slips which express the strongest depreciation of the church. On 
the rest of the slips arrange statements in accordance with the degree of appreciation 
or depreciation expressed in them. 

4. This means that when you are through sorting you will have eleven piles 
arranged in order of value-estimate from A, the highest, to K, the lowest. 

5. Do not try to get the same number in each pile. They are not evenly dis- 


tributed. 
6. The numbers on the slips are code numbers and have nothing to do with the 


arrangement in piles. 
7. You will find it easier to sort them if you look over a number of the slips, 


Chosen at random, before you begin to sort. 
8. It will probably take you about forty-five minutes to sort them. . . . 


Since Thurstone and Chave’s pioneering study, a number of 
Variations in instructions have been tried. Among the more impor- 
tant of these variations we may list those described by Seashore 
and Hevner, by Farnsworth, and by Ferguson. 

The principal change suggested by Seashore and Hevner is to 
Mimeograph the statements as a list, together with a series of letters 
Or numbers in front of each statement. This avoids the necessity 
of separate slips and of their sorting into different piles. Judges 
dicate their evaluations by drawing circles around letters or 
numbers, Seashore and Hevner report that results are the same as 
those secured by Thurstone and Chave but that these results are 
Secured with a considerable saving in time. 

_ Weare describing the steps in what is known as the equal-appear- 
'Ng-interval method of attitude measurement. This designation 
Conforms to historical usage, but Farnsworth has raised a question 
àS to whether raters operate, when rating statements, within the 
Tamework implied in the strict psychophysical sense of the term 
°qual-appearing intervals. Farnsworth wonders whether a judge 
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considers position 7, for example, as equidistant between positions 
6 and 8. Perhaps, says Farnsworth, some judges consider position 
7 a little closer to position 8, and some judges consider it a little 
closer to position 6. s 

To check on this theory, Farnsworth asked a group of judges to 
evaluate the statements in Form A of Peterson’s scale for the meas- 
urement of attitude toward war and to do this in accord with the 
Thurstone-Chave directions. When the judges had completed this 
task, Farnsworth asked them to indicate whether position E had 
been considered to be exactly halfway between positions D and F 
or whether it had been considered as more militaristic than D but 
not at a point exactly halfway between D and F. Of the judges who 
replied 63.5 per cent answered in the affirmative to the second 
alternative. They did not consider position E as exactl 
between positions D and F. 

In further exploration of the judg 
Farnsworth asked another group of jud 
a neutrality point indicated, the positi 
extreme militarism. Farnsworth argued 
interval process were operating, 
representing extreme pacifism sho 
trality point as the mean distance 
militarism. Farnsworth’s judges di 
equal distances from the point of 
they were not operating within the 
equal-appearing intervals. 

In view of these results, Farnsworth has sugg 
line approaches which avoi 


y halfway 


mental processes involved, 
ges to indicate on a line with 
ons of extreme pacifism and 
that, if the equal-appearing- 
the mean distance of the point 
uld be just as far from the neu- 
of the point representing extreme 
d not place the extreme points at 
neutrality, so he concluded that 
framework implied by the concept 


-appearing-interval 
aches are much the 


One of Farnsworth’s graphic line approaches involves the following 
set of instructions: 


You are to read the statement 
they are. Note that the extreme left of the 


—— 
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militarism of the item. Place the number of the items over the dots. More than one 
item may be placed on the same point on the line. 


The material for Farnsworth’s second graphic line procedure 
consists of a sheet with 18 vertical lines exactly 11 centimeters long 
and numbered 2 to 19. “These lines are bounded by two horizontal 
lines numbered 20 (top) and 1 (bottom).” The directions presented 


to a judge are as follows: 


You are to estimate the degree of pacifism or militarism of 18 statements (num- 
bered 2 through 19) relative to the pacifism-militarism of two sample statements. 
Consider the top horizontal line to represent the value of a very militaristic sample, 
and the baseline that of a very pacifistic sample. For example, if a certain statement 
seems midway in value between the pacifistic and militaristic samples check its 
rating line at the midpoint; if it seems closer in value to one of the samples than 
closer to the appropriate sample. None of the state- 


acifistic sample (statement 1) nor as militaristic 
20). Each statement has its particular line on 


to the other put a check mar!: 
ments will be as pacifistic as the p: 
as the militaristic sample (statement 
which its value is to be estimated. 


As we said before, Farnsworth’s procedures avoid the use of the 
erval concept, yet they give results closely allied 


equal-appearing-int 
and Chave, and to the Seashore and Hevner 


to the Thurstone 


techniques. 
Ferguson has used various sets of instructions. Some of these have 


been in strict conformity with the Thurstone and Chave procedure, 
and others have not. Among the latter we can mention two for our 
Present consideration. Both of these were used in the development 
of employee merit-rating scales. The first set was used in the devel- 
opment of a series of clerical merit-rating scales, and the second 
set was used in the development of an evaluation form for assistant 
n has reported no comparison of the 
results secured by these directions with those obtained by a strict 
adherence to the Thurstone and Chave sorting procedure. But 
judging from the results reported by Seashore and Hevner, and by 
Farnsworth, it does not seem rash for us to assume that they would 


be much the same. 
Number of Statements 


managerial personnel. Fergusor 
g P 


for Each Scale Position. Thurstone and 
Chave made no attempt to control the number of statements which 
a judge could place in each pile. But they proceeded to disregard the 
ratings of any judge who placed 30 or more statements, out of 130, 
in one pile. They did this on the assumption that a judge who placed 
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this many statements in one pile lacked the ability, or at least did 
not take the trouble, to make an adequate number of discriminations 
among the statements to be evaluated. Ea l 
Ferguson, as we can see from the two sets of directions Just men- 
tioned, has secured evaluations with and without control of the 
number of statements to be given any one evaluation. These con- 
trasting approaches appear to produce no difference, however, in the 
distribution of statements throughout the intervals of the scale. 
All distributions show a marked tenden 
statement as favoring either the acceptable (or positive) or the 
unacceptable (or negative) portions of the continuum, as well as 
a resistance to judging a statement as having neutral significance. 
Number of Intervals. Thurstone and Chave, and 
collaborators and students, have divided their 
into 11 intervals. Ferguson has used 9 and 7 intervals. Orear and 
Waldenfels have used 5, The number of intervals to be considered 
attitude variable which is tọ be 
nces of the investigator. Anyone 
equal-appearing-interval scale will 
t Symond’s data on the reliabilities 
of step intervals, These data can be 
’S text Diagnosing Personality and 


cy for judges to interpret a 


vine , 
all of Thurstone’s 
attitude continuua 


Conduct. 
Source and Number of Judges. C olleg 


accessibility to college professors and th 
to participate in such studies, more freq 
asked to serve as Judges. This is not 
one time implied. Oth 
at the Prudential Insurance Com 
Continental Illinois National Bank and Trust Company, Jurgensen 
at the Kimberly-Clark Company, Gibbons at the Owens-Illinois 
Glass Company, Knauft at the Federal Bake Shops, and Ferguson 
at the Metropolitan I any are secu those who 
have used nonstuden nt evaluation. 


evaluations can be secur 
Therefore, in most situati 
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The only advantage which appears to accrue from the use of the 
larger groups is the additional sales appeal to potential users of the 
completed scales. We need not be arbitrary, but the occasions will 
be rare when more than, let us say, 50 judges will be needed. 
_ Determination of Scale Values. Our next steps are to determine 
for each statement its median evaluation and its ambiguity. To 
determine these values we first count the number of times a state- 
ment is allocated to each scale position. This gives us a simple 
frequency distribution. Second, we convert this simple frequency 
distribution into a cumulative frequency distribution. And, third, 
we convert this cumulative frequency distribution into a cumulative 
percentage distribution. 

What we need next are three percentile points: Qı, Qs, and Qs. 


These are the 25th, 50th, and 75th percentile points, respectively. 


These can be computed arithmetically from our cumulative per- 


centage distribution or graphically. The 50th percentile of our dis- 
tribution constitutes the median value of our statement. The 75th 
and 25th percentiles are used in determining its ambiguity. This is 
found by the formula: (Qs — Q:)/2. Thus our index of ambiguity is 
one-half the difference between the 75th and 25th percentiles. It 
is, in fact, nothing more than the well-known quartile deviation. 
Statements which are ambiguous are assigned a large variety of 
ratings and have large quartile deviations. Statements with precise 
and definite meanings are assigned a small variety of ratings and 


have relatively small quartile deviations. 
In the section on evaluation of statements, we discussed several 


factors which could have some effect upon the results of the process 
of statement evaluation. We discussed variations in the instructions, 
the number of preliminary statements fo be collected, the number of 
intervals on the continuum, and the number of judges to be used. 
‘hese factors are basic and fundamental, because some decision 
Must be made on each one before any of our statements can be 
evaluated. We now find it necessary to discuss a number of additional 
actors which can affect the statement evaluation process. These 


actors are as follows: 


+ Errors in judgment 

- Errors due to position or number preference 
- The personal attitudes of the judges 
- The cultural context in which the sca 


eho 


le is constructed 


cs 
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5. The relevance of the attitude variable as a personal issue to the rater 


6. The status of the raters in relation to that of the persons whose attitudes are 
to be measured 


should have placed it in position 6 or 8. This may be due to careless- 
ation. There is no absolute safe- 
guard against this type of error. Therefore, judges should be carefully 
i mple time to make their evalua- 
, the evaluation procedure is to some extent self- 
correcting for errors of this type, because of its requirement that 
more than one judge submit evaluations. The errors made by one 
judge will be counterbalanced by the errors made by another judge, 


and this mutual cancellation of errors will leave some residue of 
truth in the mean or median evaluations, 


plied by each judge into standard scores before averaging them with 
those given by other judges. The increased degree of precision, as 
measured by reduction in the size of the standard deviations of the 
final scale values, is demonstrated in the data given in Table 31. 


Tae 31. Variability Data Sor Statements in an Equal-appearing-interval Scale 


Standard deviations ; 
J Percentage of reduction 
ae Based on raw scores Based on standard scores oniani Asainn 
Set A Set B Set A Set B Set A Set B 
9 1.79 1.60 71 63 60.3 60.6 
8 LAT | 149 54 Sl 63.3 65.8 
7 1.24 | iss AT 4B 62.1 72.3 
6 1.53 1.78 Al 31 73.2 82.6 
5 2.33 2.51 34 OG 85.4 89.6 
4 1.84 1.70 31 2 83.2 86.5 
3 1.21 1.23 32 34 73.6 72.4 
2 1.45 1.43 44 48 69.7 66.4 
1 1.58 1.61 56 Sé 64.6 65.2 
Perhaps this degree of increased precision cannot always be ex- 
pected, but there seems to 


] „b be no reason to doubt that some increase 
in precision can always be a 


ttained. 


y 
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Position or Number Preference. We present some data in Table 32 
to show that judges can have position or number preferences. In the 
study which led to the collection of these data, there was no reason 


Tase 32. Distributions of Ratings Secured in Evaluating Statements for an 
Equal-appearing-interval Scale 


Obtained percentage 
Scale value Expected percentage 
Set A Set B 
9 6.00 5.62 4.00 
8 8.45 8.46 7.00 
7 12.34 13.49 12.00 
6 11.85 12.05 17.00 
5 16.09 15.78 20.00 
4 12.64 11.50 17.00 
3 14.19 13.97 12.00 
2 10.60 11.04 7.00 
1 7.84 8.09 4.00 
Totals: eis:arwsewesnecn 100.00 100.00 100.00 


for a greater number of statements to have been assigned to position 
3 rather than to position 4 or to position 7 rather than to position 
6. The greater number of ratings in positions 3 and 7, as compared 
with those in positions 4 and 6, represents nothing more than a 
number or position preference. f 

It can be said, of course, that judges may lack sufficient insight to 
make the fine discriminations required. This may well be true, but if 
so, this fact allows position or number preference to have its effect. 
If judges could be more discriminating, their evaluations would not 
be subject to the errors caused by position or number preference. 

Personal Attitudes of the Judges. It does not seem unreasonable 
from a common-sense standpoint to suppose that the attitude of a 
rater should have some effect upon his evaluations of a set of state- 
ments. However, Thurstone and Chave, in their original monograph, 
took the position that these evaluations would prove to be inde- 
Pendent of the attitude of the raters. Four studies have now proved 
this to be the case. 

The first of these four studies was conducted by H. C. Beyle. He 
asked two groups of subjects, differentiated in their attitudes toward 
Alfred E. Smith, to evaluate statements intended for use in a scale 
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to measure attitudes toward Alfred E. Smith during the 1928 apa 
dential election campaign. He found no appreciable difference in the 
scale values assigned by these two groups of judges. 

Hinckley was the second investigator to study the problem. 
Hinckley asked a group of Negro students and two groups of white 
students to evaluate 114 statements from which a scale for measuring 
attitude toward the Negro was to be constructed. One of the white 
groups consisted of Southern students openly antagonistic toward 
Negroes, and the other consisted of N 
to have attitudes favorable to Negroes. Hinckley used the responses 


of these three groups separately and constructed three scales. These 
three scales proved to be identical in content 


ment was found zof to occupy the same rel 
scales. 


Ferguson asked groups of R.O.T.C. students and other Stanford 
University students, and groups of Epworth Leaguers (now known 
as the Methodist Youth Fellowship) in Berkeley and San Jose, 


California, to take Form A of Peterson’s scale for the measurement 
of attitude toward war and then to evalua 


comparisons method (see Chap. 12) 
ments in Form B of this same sc 
Form A, Ferguson divided his 
significantly in their attitude to 
of these groups, he determined 
statements in Form B of the P 


values assigned by these three 


orthern students professing 


, and only one state- 
ative position in all three 


te, by means of the paired- 
the relative values of the state- 
ale. On the basis of their answers tO 
subjects into three groups differing 
ward war. Then, separately for each 
the mean scale values assigned to the 
eterson scale. He found that the scale 
groups correlated .96 or higher with 
each other and .97 with those published by Peterson. 
Finally, at Columbia University, Pintner and Forlano asked 411 
students to take Form A of the Thiele and Thurstone scale for the 
measurement of attitude 


toward patriotism and to evaluate, by the 
method of equal 


; -appearing intervals, the Statements in Form B © 
this same scale. Upon the basis of t 


per cent. Pintner and Forlano 


ng the scale values assigned by 
these three groups of judges, g the scale values assign 


Except for a complicating factor discussed in the next section, 
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can say that these four studies effectively dispose of the question 
involved. We can conclude that Thurstone and Chave’s assumption 
is correct. The attitude of a rater has no effect upon, and is not 
related to, his evaluations of the statements for an equal-appearing- 
Interval attitude scale. 

Cultural Context. Farnsworth, in discussing the results of the 
foregoing studies, raises a question as to whether scale values secured 
at different time periods will be affected by changing cultural in- 
fluence. To answer this question he asked a group of Stanford Uni- 
versity students to give their opinions on the values of the items in 
Form A of Peterson’s scale for the measurement of attitude toward 
war. He did this in 1940-41 and then compared the evaluations he 
secured with those secured by Peterson in 1930-31. Farnsworth 
found a high degree of correlation (.97) between the new and the 
old series of scale values. But he also found a sizable difference in 
Mean scores from the use of these two series of values. This led 


at over the decade 1930-31 to 1940-41 the 


Farnsworth to suggest th 
cale was affected by cultural 


` s Ki 7 . , 
Significance of the items in Peterson's S 


change. 
content with the computation of the 


_ If Farnsworth had been y oi 
intercorrelation between the two series of scale values, it is very 
likely that he would have come to the conclusion that cultura 
change is of no importance insofar as its effect upon the scale values 


of the statements in an equal-appearing-interval attitude scale is 
concerned. Farnsworth was able to detect the effects he suspected 


only by a comparison of mean scores. 

This result brings into some question the adequacy of the conclu- 
sion reached, albeit independently, by Beyle, by Hinckley, by 
Ferguson, and by Pintner and Forlano. None of these investigators 
reported mean-score differences: They confined themselves to the 
computation and to the consideration of the intercorrelations among 
different series of scale values. It is possible, therefore, that their 
Conclusion would stand in need of some modification were mean- 
Score comparisons, as suggested by Farnsworth’s study, to be 


effected, 


Personal Issue. In the Hinckley study discussed in the section on 


the personal attitudes of the judges, both Negro and white students 
evaluated statements in a scale proposed to measure attitude toward 
the Negro, In addition to differing attitudes, 1t 1s possible that ego 
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involvement, particularly on the part of the Negro subjects, may 
have been a factor in determining the ratings assigned. However, the 
negative results secured by Hinckley would seem to indicate that 
neither the difference in attitudes nor the difference 
volvement caused any material change in the relative s 
assigned. i 

Status of Raters. Another factor that may affect statement evalua- 
tion is group status. In a study by Fer 2 
assistant managers, and agents, evaluated statements from which 
an assistant managerial evaluation form was to be constructed. It 
was found that the judges from all three groups rated or evaluated 
the statements in approximately the same way. Thus, the ego- 
involved assistant manager group, the superior status group (man- 
agers), and the inferior status group (agents) all rated the statements 
in the same way. Therefore, in the Thurstone equal-appearing- 


interval procedure we can disregard group status as a factor 1n 
statement evaluation. 


Selection of Final Statements. When mean or median scale 
values have been determined, we shall find it advisable to arrange 


in ego in- 
cale values 


guson, three groups, managers, 


igned by Thurstone and Chave to Measure 


Attitude toward the Church* 
State- 


Mink Scale 9 
value | value Statement 
number 
65 11.0 1.4 |I think the church is a Parasite on society, 
18 10.8 1.8 |Taman atheist and have no use for the church; 
72 | 10.7 | 1.7 |1 think the organized church is an enemy of science and truth. 
96 10.5 | 1.9 JI regard the church as a static, crystallized institution, and aS 
such it is unwholesome and detrimental to society and the 
individual. 
25 10.5 1.6 |I believe the churches are doing far more harm than good. 
108 10.5 1.6 | I believe the church is full of hypocrites and have Wo use for it. 
41 10.5 1.0 |I think the country would be better off if the churches were 
closed and mini k 


48 10.4 1.4 


a | 
* Adapted from Thurstone, L. L., and Chave, E. J. T} ` seagi 
University of Chicago Press, 1929, l pe: ia Measurement of Attitude. Chicag 
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Our task is to choose some small number of these statements to 
represent each position along the attitude continuum. 

The total number of statements to be selected must be determined 
in an arbitrary manner. Thurstone and his collaborators have 
generally tended to use 20 statements. The author has used 20, 26, 
52, and 78 statements. Uhrbrock has used 50 statements, and other 
investigators have used still other numbers. In making a decision 
on this point, we must keep in mind the fact that the total number of 
statements we select will determine for us the number of statements 
that we can have to represent each interval on the continuum. If 
we choose a total of 33 statements for a continuum divided into 11 
intervals, we can have each of these intervals represented three 
times. If we choose 22 statements, we can have each interval repre- 
sented only twice. Another factor to keep in mind is the effect of the 
length of any psychological test upon the reliability of the scores 
which it yields. In general, and within certain limits, the longer our 
test, the more reliable the scores based on it. 

Thurstone and Chave decided that they wanted 45 statements in 
their scale for the measurement of attitude toward the church. 
Therefore, each of their 11 intervals could be represented about 4 
times. Actually, 2 of their intervals have 3 statement representatives, 
1 has 5, and 1 has 6. This leaves 7 intervals that are represented by 
4 statements each. The statements which Thurstone and Chave se- 
lected to represent the interval 10.0 to 11.0 were numbers (see Table 
33) 65, 72, 96, 41, and 48. Their median values are 11.0, 10.7, 10.5, 
10.5, and 10.4. These values average 10.6, so, collectively, they can 
be considered adequate representatives of the interval with which we 
are concerned. 

In the process of selecting our final list of statements, it is gen- 
erally desirable to give thought to the preparation of at least two 
alternative forms of our scale. An easy way to do this would be to 
arrange them in order according to scale value, number them, and 
then assign all odd-numbered statements to Form A and all even- 
numbered statements to Form B. This procedure would not be 
entirely satisfactory, however. It would result in the allocation of 
the higher valued statement in each odd-even pair of statements to 
one form and in the allocation of the lower valued statement in 
each odd-even pair of statements to the alternate form. Thurstone 
and Chave solved this problem by assigning the higher valued 
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statement in their first pair of statements to Form A and the lower 
valued statement to Form B. Then, in their second pair of state- 
ments they assigned the higher valued statement to Form B and the 
lower valued statement to Form A. Alternating in this way, they 
avoided the possibility that all of the higher valued statements 
would be assigned to one form and all the lower valued statements 
to the other form. 

Ambiguous Statements. Our measure of statement am biguity is the 
quartile deviation of the distribution of ratings given to each state- 
ment. We have already indicated its method of computation. Obvi- 
ously we want statements with precise and clear-cut meanings. This 
is not an all-or-none proposition, however, so we 
standard by which we can select the less 
contrast with the more ambiguous ones. 


must have some 
ambiguous statements in 
Therefore, if we have 6 
statements available and need to use only 3 of them, we should 
choose the 3 statements with the lowest deg 


ree of ambiguity. We do 
this by choosing 


the 3 statements with the smallest quartile devia- 
tions. Table 33 shows the quartile deviations for 8 of the Thurstone- 
Chave statements concerning attitude toward the church. We find 
from these data that the average quartile deviation for the state- 
ments retained is 1.48, while that of the 3 statements not retained is 
1.67. Thurstone and Chave tended to select the less ambiguous 
Statements, yet one of the statements retained has a large quartile 
deviation. This illustrates that other factors be i 
ambiguity are also taken into account. 

The number of statements scale 


sides degree © 


d at the various intervals of our 
und to vary. For some intervals 
Statements, for some there will be 
ethaps, not really enough. There- 
uity cannot, in all probability, be 
of the attitude continuum. ; 
ow ready to give our scale an experi- 
We appropriate group of subjects and 
ask them to indicate the Statements with which they agree and the 
statements with which they disagree. When we get these data, We 
: bhava have call the criterion of statement 
irrelevance. This criterion of irre 


[ levance is an index which shows the 
degree to which each statement į i 


some, p 
f ambig 
l intervals 
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of subjects (and in fact the same individuals) that endorse statement 
A also ought to endorse statement B. Contrariwise, if statements C 
and D represent widely divergent scale positions, the individuals 
who endorse statement C should not endorse statement D, and vice 
versa. We compute the criterion of irrelevance, for statement 4, in 
accord with the formula C/B. In this formula, C represents the 
number of subjects who endorse both statements A and B, and B 
represents the number of subjects who endorse statement B only. 


100, 


0.80] 


o 
D 
fo} 


o 
b 


Index of similarity 


020 


7 8 9 10 I 12 
Stotement 96 


Scole volue 
Fic. 2. Irrelevancy values for a statement which passes the irrelevancy test. (From 
Thurstone, L. L., and Chave, E. J. The Measurement of Attitude. Chicago: University 


of Chicago Press, 1929.) 


If all subjects who endorse statement A also endorse statement B, 
our index will be unity. If none of the individuals who endorse state- 
ment A endorse statement B, our index will be 0. 

It is necessary to compute our criterion of irrelevance NN —D 
times. In a 20-item scale we have 20 X 19 (or 380) criterion values 
to compute. We need to compute it 19 times for statement Ty 19 
times for statement 2, 19 times for statement 3, and so on. When we 
have made these computations, we prepare a series of graphs such 
as those shown in Figs. 2 and 3. By inspecting these graphs we can 
determine which of the items, if any, do not meet our criterion, and 


we shall eliminate these items 
Figure 2 shows the plot for 
10.5 and which passes the irrelevancy 


from our scale. f 
a statement which has a scale value of 
test. It passes the test because 
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items having similar scale values are endorsed, in the main, by the 
same subjects. It is not endorsed by subjects who endorse statements 
with markedly different scale values. 

Figure 3 shows the plot for a statement which has a scale value of 
4.1 and which does not pass our criterion. It does not pass, because 


1.00) 


0.80 


040 


Index of similarity 


0.20 


5 6 7 8 9: 10 i 
Statement 23 
Fic. 3. Irrelevancy values for a statement which fails the irrelevancy test. (From 


Thurstone, L. L., and Chave, E. J. The Measurement of Attitude. Chicago: University 
of Chicago Press, 1929.) it of Attitude. Chicago: Universit} 


12 
Scale value 


the numerical value of the index remains fairly constant, and this 
is true regardless of the scale value of the item being compared with 
the one whose irrelevancy is at stake. The graph shows that subjects 
endorsing the statement in question are no more likely to endorse 
other statements closely allied in scale value than they are state- 
ments far removed in scale value. We conclude the item to be 
irrelevant and shall eliminate it from our scale 

In their monograph, Thurstone and Chave ex ress some dis- 
satisfaction with this criterion of irrelevance and ons that a better 
one should be developed. Guttman, among others ae 
challenge and has described for this purpose what h 
of scalability. We shall not discuss this technique 
it can better be described in connection on Li 
attitude-scale construction (see Chap. 5). í 


has accepted this 
e calls a technique 
here, however, 45 
kert’s method © 
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_ Inthe meantime, we can point to one other technique of determin- 
ing item irrelevancy. This is factor analysis. We can show by this 
technique the extent to which the various items in our scale con- 
tribute to some one major factor underlying item intercorrelations. 
Statements that do not contribute significantly to such a factor can 
be considered irrelevant and can be eliminated from our scale. 
Collection of Norms. There is nothing in this operation unique 
to the Thurstone method of equal-appearing intervals. The only 
requirement is that we secure distributions of scores for the groups 
we consider most adequate as standardization groups. For their 
scale measuring attitude toward the church, Thurstone and Chave 
secured distributions for 16 groups of subjects. Most of these were 
student groups at the University of Chicago. The groups represented 
were freshmen, sophomores, juniors, seniors, graduate students, 
divinity students, and members of the Chicago forum. The remaining 
groups, consisting of a reclassification of the subjects in the groups 
already mentioned (and not mutually exclusive), are Catholics, 
Protestants, and Jews; churchgoers and nonchurchgoers; and active 
and inactive church members. Tables 34 and 35 (in which low scores 


Taste 34. Attitude toward the Church According to Religious Preference* 


Catholic Protestant Jewish 
Attitude score = = 

N % N % N % 
1.0-1.9 28 38.8 37 8.0 2 S 
2.0-2.9 19 26.3 149 32.1 17 9.7 
3.0-3.9 13 18.1 92 19.9 | 22 12:5 
4.0-4.9 3 4.2 58 12.5 31 17.6 
5.0-5.9 a 4.2 47 10.2 34 19.4 
6.0-6.9 3 4.2 44 95 33 18.7 
7.0-7.9 2 2.8 25 5.4 24 13.7 
8.0-8.9 | .. | seers 10 Be 9 5.1 
9.0-9.9 1 1.4 1 0.2 A. Bee. 
72 | 100.0 | 163 | 100.0 | 176 | 100.0 

2.90 3.97 3.44 


* From Thurstone, L. L., and Chave, E. J. The Measurement of Attitude. Chicago: University 


of Chicago Press, 1929. 
able to the church) show the distributions of 


Indicate attitudes favor, t 
d illustrate the types 


scores for certain of these groups of subjects an 
of normative data Thurstone and Chave provided. 
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Taste 35. Attitude toward the Church According to Membership and Attendance* 


Active church members Church attendance 
Attitude Yes No Yes No 
score p3 p oo 
| | 
N % N| % | A % © 
| = 
i “i 7 | 
1.0-1.9 | 106 | 18.2 | 13 | Lp | we | mal g 0.9 
2.0- 2.9 | 245 42.3 81 | 10.4 | 282 | 41.6 40 5.8 
3.0-3.9 | 114 19.6 | 106 13.6 | 155 22.9 75 10.8 
4.0- 4.9 54 | 9.3 | 106 | 13.6 66 97 | 98 14.2 
5.0- 5.9 33 5.7 | 108 13.8 30 | 44 | 120 17.3 
6.0- 6.9 20 | 34 | 1399 | 77 15 2.2 | 148 21.3 
7.0- 7.9 7 12 | He | 8 7 1.0 | 100 14.5 
8.0- 8.9 2 0.3 | 8 | 10.8 4) 06 sl 11.7 
O10 BAO ER cares 26 EE i aeii aaen 22 $2 
10.0-10.9 e een 2 (ie De a, s (er zj 03 
Totis anes ia 581 | 100.0 | 781 | 100.0 | 678 | 100.0 | 692 | 100.0 
Mean......... 3.09 5.66 | 3.06 5.93 


* From Thurstone, L. L., and Chave, E. J. The Measurement of Attitude. Chicago: Univer- 
sity of Chicago Press, 1929. 


Our scale is now complete and ready for use. The steps we have 
been through are as follows: 


. We collected a long list of statements. 

. We had a number of judges evaluate these statements. 
. We determined their scale values. 

We eliminated the more ambiguous statements. 

. We eliminated irrelevant statements. 

. We selected a final list of statements. 

. We prepared tables of norms. 
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Our two remaining tasks are to determine reliability and validity- 


RELIABILITY AND VALIDITY 


We can determine reliability in either one of two ways: by the 
usual split-half technique or, and this is more usual, by determining 
the correlation between the alternate forms of our scales. Thurstone 
has claimed that the reliabilities of all scales under his editorship 
are in excess of .80, but other investigators have sometimes found 
lower reliabilities. Table 36 contains a sampling of the coefficients re- 


ported by Likert, Reslow, and Murphy, by Lorge, and by Ferguson. 
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Tanie 36. Reliability Coefficients for Thurstone Attitude Scales* 


Attitude scale Low | High 


Likert, Roslow, and Murphy 


Attitude toward birth control.......-+--++5+s55se sree 62 93 
Attitude toward the Chinese.....-.----+0esseerrrrrett SY 86 
Attitude toward communism. 66 .93 
Attitude toward evolution. . . -67 86 
Attitude toward the Germans. . ses eese tettre te .42 59 

Attitude toward God 
Belief in the reality of. .79 -93 
Influence on conduct..... + „84 -92 
Attitude toward the Negro 57 3 

Attitude toward war 
Droba’s scale... al .83 
Peterson’s scale... .70 -86 
Attitude toward the Bible.. -0e eere tenner ren ae | .83 
Attitude toward birth control... -e+ -eere te AEE 68 84 
Attitude toward capital punishment...» +- 59 .88 
65 .82 


Attitude toward censorship. MEAN eS eT | | 
67 


Attitude toward the Chinese. . . - 39 
Attitude toward the Constitution,......+++++259se++ | ato 84 
Attitude toward communism.....- 81 195 
Attitude toward evolution. .... -+s+ +> 71 -92 
Attitude toward the Germans. E -58 
Attitude toward God (belief in reality of). „81 SH 
AT an 


Attitude toward the Negro... «eei eee ee tentet 
Attitude toward patriotism. . a| J .83 


Attitude toward Sunday observance. .73 .83 
Attitude toward the treatment of criminals. . 69 76 
Attitude toward war (Peterson) 44 „84 
Ferguson 
Attitude toward birth control.» s<.. -+4 1 1e waria] Oe „84 
Attitude toward capital punishment. wee 79 .88 
Attitude toward censorship. -+++ 72 84 
Attitude toward communism. . 78 88 
Attitude toward evolution. . .--- 82 90 
Attitude toward God (belief in reality of).. u Fn 


Attitude toward law... ee te 
Attitude toward the treatment of criminals. . ST 73 
Aétitudles townrd: wai anv ain ae aie Hortense sere eA 62 AE 
* From Ferguson, L. W. A revision of the primary social attitude scales. 7. Psychol., 1944, 
17, 229-241. From Likert, R., Roslow, S., and Murphy, G. A simple and reliable method of 
doting the Thursrane attitude:stales. 7.20 Paychol, DA 228-238. From Lorge, 1. The 
Thurstone attitude scales: I. Reliability and consistency of rejection and acceptance. J. soc. 


Psychol., 1939, 10, 187-198. 
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Thurstone and Chave discuss the validity of their scale to measure 
attitude toward the church in terms of its correlation with self- 
ratings, in terms of its differentiation of religious groups, and in 
terms of its differentiation of active church members from inactive 
church members. They point out that scores on the attitude scale 
correlate .67 with self-ratings, that Catholics secure more favorable 
scores than Jews, and that active church members attain more 
favorable scores than inactive church members. 

Ida B. Kelley, working with Remmers, determined the validity 
of the scale to measure attitude toward Sunday observance by 
comparing the scores of Seventh-Day Adventists with other de- 
nominational groups. Stouffer and Hattie N. Smith have made 
comparisons between attitude scores and case-history studies, while 


other investigators have relied upon the correlations with other. 


scales designed to measure the same or similar attitudes. 


EXTENSIONS OF THURSTONE’S TECHNIQUE 


Attitude scales developed in accord with the Thurstone equal- 
appearing-interval procedure have been put to many uses. There are 
more than 500 references to them in the literature in which they 
have played an important part. In these studies, Thurstone-type 
attitude scales have been used to determine the effects of movies 
on attitudes toward crime and toward nationality groups. They have 
been used to determine the effects of social-science courses upon 
student attitudes. They have been used in studies designed to 
determine the origin of attitudes. They have been used to determine 
the relative effectiveness of written and oral propaganda. They have 
been used to determine the effect of college attendance upon atti- 
tudes. They have been used to determine important attitude 
intercorrelations. They have been used to determine the degree of 
employee morale. And, finally, they have been used as a basis for 


various systems of employee appraisal. We could 
illustrations, but these will su 


for us to review many of thes 
involve no advance in the techni 


r 1 que of measurement, they fall out- 
side the intended scope of this v: ee 


olume. 
There have been two major attempts to extend the useful 


A ness of 
the Thurstone technique. These attempts have been made b 


y Rem- 
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mers and by Ferguson. Both of these investigators have tried to 
generalize the technique but in markedly different ways. 

Remmers. This investigator has generalized the Thurstone tech- 
nique by preparing a type of scale that can be used to measure 
attitudes toward a great many objects of some designated class of 
objects. For example, he had Ida B. Kelley develop a scale to meas- 
ure attitude toward any social institution. This scale must be com- 
pleted separately for each social institution toward which we wish 
to know a person’s attitude, but the same statements are used 
regardless of the institution involved. 

A strict application of the Thurstone approach requires the 
development of a scale for each attitudinal object in which we are 
interested. If we are interested in attitudes toward communism and 
in attitudes toward war, we need to develop one scale to measure 
attitude toward communism and another scale to measure attitude 
toward war. Remmers has attempted to short-cut this process by 
developing one general scale by means of which we can measure 
attitude toward communism, attitude toward war, and, in fact, 
attitude toward any social institution. A copy of one form of this 
scale is shown in Table 37. Subsequent to the development of this 
first scale, Remmers and his collaborators developed the additional 
Scales listed in Table 38. ‘ 

In each instance, the name of the school subject, homemaking 
activity, or poem must be indicated. But the same statements are 
used for any member of the appropriate attitudinal group. 

The problems involved in the development of a Remmers-type 
attitude scale are, for the most part, the same as those involved in 
developing a Thurstone-type scale. One new problem arises, how- 
ever. This revolves around the extent to which one series of state- 


ments can be made applicable to all of the objects in one attitudinal 
can one set of statements be used in measuring 


d attitudes toward our penal system, 
both of which are social institutions? We have cited an extreme 
example for the sake of pointing up the problem, and in this case, the 
answer is probably “No.” Remmers’s own. data, however, are con- 
flicting. When Kelley’s scale was used to measure attitudes toward 
communism and Sunday observance, correlations of .98 and .83 with 
the corresponding Thurstone scales were obtained. When it was used 
to measure attitude toward war, & correlation of —.15 was obtained. 


class. For example, 
attitudes toward marriage an 


TABLE 37. 
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Statements in Form A of Kelley's Scale to Measure Attitude toward any 


Institution® 


OONDPAN 


10. 


. Is perfect in every way. 


Is the most admirable of institutions. 


. Is necessary to the very existence of civilization. 
. Is the most beloved of institutions. 
- Represents the best thought in modern life. 


Grew up in answer to a felt need and is serving that need perfectly. 


Gives real help in mecting moral problems. 


11. Gives real help in meeting social problems. 
12. Is valuable in creating ideals, 

13. Is necessary to the very existence of society, 
14. Encourages social improvement. 

15. Serves society, as a whole, well. 


45. 


. Aids the individual in wise use of leisure time, 

» Is necessary to society as organized. 

. Adjusts itself to changing conditions. 

. Is improving with the years, 

. Does more good than harm. 

+ Will not harm anybody. 

- Inspires no definite likes or dislikes, 

« Is necessary only until a better can be found. 

- Is too liberal in its policies, 

© Is too conservative for a changing civilization. 

» Does not consider individual differences, 

» Is losing ground as education advances, 

. Gives too little service, 

. Represents outgrown beliefs, 

. Gives no opportunity for self-expression, 

- Promotes false beliefs and much wishful thinking. 
- Is too selfish to benefit society. 

- Does more harm than good. 

» Is cordially hated by th 
« Satisfies only the most stupid with its services, 
. Is hopelessly out of date. 

- No one any longer has faith in this institution, 
. Is entirely unnecessary, 

. Is detrimental to society and the individual, 

. The world would be better off wi 
- Is in a hopeless condition, 

+ Will destroy civilization 
. Never was any good. 

. Benefits no one. 


if it is not radically changed. 


Has positively no value. 


thout this institution, 


. Exerts a strong influence for good government and right living. . 
- Has more pleasant things connected with it than any other institution. 
. Is a strong influence for right living. 


e majority for its smugness and snobbishness. 


*From Remmers, H. H. (Fd.) Studies in Hig 
Lafayette, Ind.: Purdue University, 1934, 


her Education XXV1, 


Studies in Attitudes. 
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Tasir 38. Egual-appearing-interval Attitude Scales 
Edited by Professor H. H. Remmers 


Attitude toward any disciplinary procedure 
Attitude toward any group 

Attitude toward any home making activity 
Attitude toward any play 

Attitude toward any poem 

Attitude toward any practice 

Attitude toward any proposed social action 
Attitude toward any proposed so ivi 
Attitude toward any school subject 
Attitude toward any teacher 
Attitude toward training 

Attitude roward any vocation 


Remmers attributes this low correlation, however, to the low degree 
of reliability of the Thurstone scale. When Grices’ scale to measure 
as used to measure attitude toward the 
Negro and toward the Chinese correlations of .98 and .77 with the 
corresponding Thurstone scales were obtained. Finally F. D. Miller 
constructed a Thurstone-type scale to measure attitude toward 
teaching and found that the scores on it correlated only .58 with the 
scores on attitude toward teaching when measured by H. E. Miller’s 


scale to measure attitude toward any vocation. These varying 
lized scales must be used with caution. 


s the corresponding Thurstone 


attitude toward any group wW 


results suggest that genera 
Some will give the same results a 
Scales, and some will not. 
Table 39 presents a sum 
mers reports for his genera 


the highest is .98, and the average is .75. 
Remmers and his collaborators have made extensive use of these 


generalized scales to ascertain some of the determinants of our social 
attitudes. For example, they have attempted to determine how 
children’s attitudes toward law are influenced by their participation 
in self-government, how the study of school subjects can affect 
attitudes toward divorce, social insurance, capital punishment, and 
labor unions, how propaganda can affect attitudes toward the 
Negro, and how lecture material can affect attitudes toward extra- 
national political organization. 
Ferguson. This investigator's at 
technique proceeds along lines qu 


mary of the reliability data which Rem- 
lized scales. The lowest coefficient is .47, 


tempt to generalize the Thurstone 
ite different from those used by 
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Tas e 39. Reliability Data for Remmers’s Generalized Attitude Scales* 


Scale N ras 
Attitude toward any group: 
Chinese 217 -77 
BAB AE AE E E, 217 „84 
Attitude toward any home-making activity: 
Caring for children A 320 -90 
yea | PRCT aA ON HE Epura E ETEA 320 -88 
Attitude toward any institutio 
SOTNI ean aao som oria e a i A 83 -89 
T ORG E T TA e YP „81 
TADON ORONS o senan hi ct Nils. 2: ccs x ai 92 -76 
Marriage.......... A 127 71 
Sunday observance. 222 98 
WI ST A A eetbes 80 77 
Attitude toward any p 
PEELE OCT Rac iat E ORBEA IEE wakes O. 230 5 
123 76 
84 65 
100 | .79 
65 -61 
60 86 
40 68 
Abolition of compulsory military training in college... 78 -91 
Abolition of township trustees in Indiana... OUUU UUS 78 81 
Compulsory sex education for adults... . 40 70 
Divorce 102 81 
100 A 
78 .78 
78 | .73 
269 81 
771 70 
705 68 
579 74 
223 60 
875 7 
293 76 
T era ee 261 47 
ROARS HD SHG ined anaes WERE Rain eo sina 620 66 
SEB anv amemensoens arses ns es) 182 || «82 i 
PACH SVOCTUON aese ana ee sij 292 g9 
High school teaching FR anges i) Don af 
Homemaker....... . i ae «| 428 pA 
Ministry......... ws x || AOL a 
Salesgirl. . a anaa E UE EEDE RA ORTH IE Aa aaan EE sty 308 -76 
PERN ties E E E E ee A ue de 
Unskilled laborer................ aie $ 
UNSW EF sosaren nena 266 Kyi 
* Adapted from Remmers, H. H. (Ed.) Studi OF aii 


in Higher E i 7 
XXXIV. Lafayette, Ind.: Purdue University, 1934, 193 i XXVI, XXXI, and 
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Remmers. He started with ten Thurstone-edited scales, computed 
their intercorrelations, subjected these intercorrelations to several 
centroid factor analyses, and, upon the basis of these analyses, 
developed three new scales to do the work of the original ten. The 
steps in Ferguson’s line of attack were as follows: 


1. The original scales were selected. 
2. Subjects were secured. 
3. The scales were scored. 


4. Intercorrelations among the scales were computed. 
5. These intercorrelations were subjected to several centroid factor analyses, 


and three factors were extracted. 
6. Scores on each of these factors were determined. 
7. Criterion groups scoring high and low on each factor were selected. i 
8. The percentages of the criterion groups agreeing and disagreeing with each 


item were determined. 
9. Items for the new scales were selected. 
10. Scoring weights for these items were assigned. 
11. Scores on the new scales were computed. 
12. Norms were prepared. 


In the following paragraphs we shall review each of these steps. 
This should enable any reader, who wishes, to repeat the studies 
involved or to apply the same methods in the development of addi- 
tional generalized scales. ; ; 

Selection of Attitude Variables. In pursuing this approach, the 
Scales to be used will depend upon the nature of the problem to be 
Investigated, upon the insight and hunches of the investigator, and 
upon the availability of scales from which a choice can be made. 
We can set forth no general rules to be followed, for the particular 
demands of each investigation will dictate the choices to be made. 
In the present instance the scales used were among those used by 
Thelma Gwinn Thurstone and by H. B. Carlson in two earlier 
factor-analysis studies. Ferguson wanted to see if certain of the 
Suggestions made by these investigators could be clarified and 
extended. The scales he used are listed in Table 40. 

Selection of Subjects. Choice of the subjects to be tested will de- 
pend upon the purpose and scope of the investigation to be con- 
ducted, upon their accessibility to an investigator, upon the cost 
Involved, and upon any special characteristics that may be desired. 

he subjects used in the present instance were students at Stanford 
University and at the University of Connecticut. Certain collateral 
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studies were based upon students from these and from 16 other 
colleges and universities. In the major parts of the investigation the 
data for the Stanford and Connecticut groups were separately 
analyzed, were found to yield comparable results, and were then 


Tape 40. Equal-appearing-interval Scales Used in the Derivation of the Primary 
Attitude Scales 


Attitude toward evolution 

Attitude toward birth control 

Attitude toward God 

Attitude toward capital punishment 
Attitude toward the treatment of criminals 
Attitude toward war 

Attitude toward censorship 

Attitude toward communism 

Attitude toward law 

Attitude toward patriotism 


combined. Most of the results which we shall report are based upon 
the combined sets of data. Occasionally, however, it will be necessary 
for us to make separate reference to the Stanford and Connecticut 
groups. 

Scoring the Attitude Scales. We know from our earlier discussion 
on the subject that the score on a Thurstone-type attitude scale 
consists of the mean or median value of the statements with which 
a subject indicates agreement. Statements with which he indicates 
disagreement are disregarded in scoring. Each of the scales included 
in Ferguson’s study contained 20 statements. And since he used @ 
total of 20 scales (two forms for each attitude variable), each subject 
had to respond to a total of 400 statements. On each scale, Ferguson 
took as a subject’s score the mean value of the statements he 
endorsed. 

Computation of Intercorrelations. In most instances the type of 
correlation to be computed to show the relation between the scores 
on two Thurstone attitude scales will be 7, the Pearsonian coefficient 


of linear correlation. To provide the necessary degree of accuracy in 


factor analysis, these correlations must be carried to a minimum ° 


three decimal places, sometimes to a minimum of four. In the present 
instance three decimals were considered sufficient. It is necessary 
for our purposes here, however, to report the results only to the 
second decimal, and they are so reported in Table 41. 


Taste 41. [utercorrelations among 10 Thurstone Attitude Scales* 
(N = 643 or 462) 


| È | 
‘ | | Treatment Capital z s 
War KDE God of anid. Censor | evolution Birth tan |S" 
ism} TAS | shipt control mu- 
Scale criminals ment | f 
aon | = j 7 J nismį 
| | | 
twa B | A B A B A B | A B| A B A | B A B A B A 
i ] je | —y | | a | 
a PEE TE D., |B| .62 | | | 
Patriotismt......- A| 18, 21 | | 
|B | -11| .19| 60) | 
God servencanintes | A\—.06| .06) 24) .24) 
| B| .00| os| .23| 26| -85 | | 
Treatment of crim-| A| .21| .17| -25| .25| -10| .12 | 
inals B| 22| .18| .23| 24) .07| 07) .57 | 
Capital punishment A | 19) .15) .16| .16} .01| .00) .44) 46) 
| B | 16} ul .19} 19) .02| 03) .47| -46| 79| 
Censorshipt.......) A 00! .02) 22) .18| .28| .28| .10) .12) 06) .11 
|B | o2) os 20; .19| .23| .27| .10) .04| .03) .08| .72 
Evolution... | a |—.07|—.12|—.17|—.20|—.44|—.48,—.07|—.04) .06/ .06/—.19/—.21 
| B |—.06) .10|—.17|—.20/—.46|—.50/—.09j—-08| 03| .05)—.25|—.26 -82 | 
Birth control... ... A —.14|—.18|—.07 —.10|— .23|— 24) — -04 — .06|—.04!— 03)— -09|— 09, .38 41 | 
B= 09}— 14\—.09|—.11|—.23|—.25|—.06—.04|—.02|—.02|— 18|—.18| .38} -40| 72l 
Taw. casein oot A 04 .06 30! .28! .26| .25| -18 15! .10) -18| .34| .33|—.14|—.15|— .02 — .02 
| B|—.02| o6) 19) 19) -19 -18| -16 18) 07| 17| 29) .32|—.07—.07| 04—01) 47 
Communismt. .. .. A [= 06-05 ~ 32|— .28|— .22|— .22|— .25|— .27|—.19|— . 19) — .25)— -25| -23 .20| .08| .10\— .27)—.23 
g |—1|—.05)—.37|—.29|—27|—28|—.27|- 24|— .26|—.25)— 27|—.28| .25| .24| .10) .13}—.30)—.27| .78 
* From Ferguson, L. W, A revision of the primary social attitude scales. Y. Psychol., 1944, 17, 229-241. 


t Correlations for the variables marked with dagger are based upon 462 cases. 
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We have already indicated that the scales used in this study were 
available in two alternate forms. This leads to four coefficients show- 
ing the relationship between each pair of attitude variables. For 
example, we have the correlation between Form A of the first vari- 
able and Form A of the second variable; we have the correlation 
between Form A of the first variable and Form B of the second 
variable; we have the correlation between Form B of the first 
variable and Form A of the second variable; and, finally, we have the 
correlation between Form B of the first variable and Form B of the 
second variable. Table 41 shows a remarkable similarity among the 
four coefficients in each such set. For example, the coefficients 
expressing the relationship between attitude toward censorship 
and attitude toward patriotism are .18, .19, .20, and .22. In Table 
41 there are only six sets of coefficients in which the individual 
coefficients differ by more than .05 from the set average. 


Tase 42. Summary of Factor Loadings* 


a er a aa S 


Factor I Factor II Factor III Meant 

Attitude scale — a n 

loadings 

1&2|1&3|1&2|2&3 183| 283 

God EPE EE A | —.66 |-~.66 —.66 
B| —:20 | —70 —.70 
Evolution ...........00. A| .76| .76 16 
B| .80] .80 80 
Birth control............ Al 57 57 57 
B| 57| 57 57 
i re A sdf]. dG — 46 
B —.41 | —.43 — 42 
Treatment of criminals...) A —.56 | —.56 — .56 
; B —.63 | —.61 =6 
Capital punishment... ... A Ft), || aaah — .68 
B 69!) =.64 — 66 
Pattiotiom iste castro A 55 49 52 
B 50] 44) 4 
Consotahip ioe eccsecnisis ey A 58 58 58 
B 59} 59] -59 
iii eras aeons see ae A 57 E 56 
, B 49'| Sij a30 
Communism,........... A =e | ue) et) 
B —.67| —.68| —-68 


2 oS oe so Lo. Ee oe 
* From Ferguson, L. W. A revision of the primary social attitude scales, J. Psychol., 17 
17, 229-241. 
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Factor Analyses. There are several methods of factor analysis. 
Among the most important are Burt’s summation method, Hotel- 
ling’s principal component method, and Thurstone’s centroid 
method. For most problems Thurstone’s method will be found the 
easiest to apply and the easiest to understand. It gives an idea of the 
smallest number of factors we need to postulate to account for the 
intercorrelations which we subject to analysis. In the present in- 
Stance, using Thurstone’s method, three factors were found to be 
sufficient to explain the intercorrelations. The results are given in 
Table 42 and in Fig. 4. 


g 
2 
Š 
$ 
È 


I Humanitariqnsm 


Fic, 4. Factor structure of the primary social attitudes—religionism, humanitari- 
anism, and nationalism, principle projections only. (The angular separations are 87 
degrees between Factors I and II, 65 degrees between Factors I and II, and 61 
degrees between Factors II and III. The sum of the two latter angular separations 
Need not equal the first as the diagram might seem to suggest.) 


In Fig. 4 Factors I and II can be said to be orthogonal, that is, 
Unrelated to each other, but Factors II and TI exhibit a substantial 
relationship. Nevertheless, these last two factors are clearly distinct, 
and can be treated operationally as different variables. 

_One of the characteristics of a statistically and psychologically 
Significant factor is that its factor loadings remain invariant in 
different test batteries. Table 42 shows that the factor loadings under 
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consideration possess this characteristic of invariance, so we can 
conclude that our factors possess at least one of the characteristics 
prerequisite to their being considered statistically and psychologi- 
cally significant. 

A second characteristic of a statistically and psychologically 
significant factor is that it appears in the same form in data based 
upon different subject populations. Our factors meet this test also. 
They were isolated independently from data for two subject popula- 
tions: Stanford students in 1937 and Connecticut students in 1941. 
The results secured from these analyses are almost identical. , 

A third characteristic of a statistically and psychologically signif- 
icant factor is that it is subject to a logical and rational explanation. 
Our factors seem to pass this test also. We know from common-sense 
observation and from general knowledge that attitudes toward 
birth control, evolution, and God vary in definite and specific ways 
in relation to each other. Fundamentalists believe in God and dis- 
believe in evolution and birth control. Liberals tend to believe 1n 
evolution and birth control and to believe less in God. Our factor 
analysis, linking together attitudes toward God, evolution, and birth 
control, meets the test of common sense and appears to be a measure 

, of what can be called religionism, that is, a strict devotion to religion. 

Let us now consider our second factor. We find attitudes favoring 
war linked with attitudes favoring capital punishment and the harsh 
treatment of criminals. The three variables subsumed by this factor 
have reference to methods of dealing with delinquents. In two in- 
stances, these delinquents are individuals, and in the other, a nation: 
Apparently (and this makes good common sense), a person who 
believes the way to handle a criminal is to treat him roughly and t° 
make an example of him applies this same kind of remedy to the 
settlement of international disputes. Conversely, a person who 
believes in a more humane type of treatment for an- individué 
offender also believes in the same type of treatment for the nationa 
deviate. This factor, therefore, can be called humanitarianism. 

Finally, let us consider Factor HI. Here we have subsumed the 
more specific attitudes toward patriotism, law, censorship, 4” 
communism. It certainly accords with common-sense observatio” 
that the patriotic person is an upholder of the law and of the pra& 
tice of censorship and is opposed to communism. In accord with oUF 
present-day cultural beliefs, we can call this factor nationalism: 


te ome 
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We must consider ourselves fortunate in finding such easily inter- 
pretable factors. In many studies we would not achieve such a happy 
result. Many studies will end right at this point, therefore, because 
no logical, rational, or common-sense explanations of the factors will 
be forthcoming. This is one of the occupational hazards of factor- 
analysis research. 

Determination of Factor Scores. We assume, at this point, that all 
necessary axis rotations have been completed. This being the case, 
we compute, from these rotated factor loadings, factor scores for 
each individual in our subject population. A convenient method of 
doing this is suggested by Godfrey Thomson in his book The Factorial 
Analysis of Human Ability. These calculations give weights of 0 
to any scales not related to the primary axis, and varying weights, as 
they should, to the scales related to the primary axis. They also 
yield estimates of the validity with which factor scores can be pre- 
dicted from the scores on the subsumed scales. Based upon the 185 
Stanford University cases originally processed, the validities of the 
factor scores were found to be as follows: 


Factor I...... pees) 
Factor II. a, ae 
Factor IIT.... 90 


In determining the validities of factor scores, we must be on 
Suard against arriving at spuriously high estimates. The formula 
Suggested by Thomson is such that validity increases as we increase 
the number of tests in our battery. Theoretically, we could secure 
Perfect validity if we used a sufficient number of tests. Now have 
We, in the present instance, secured spuriously high estimates of 
validity by using alternate forms of the same tests in our factor 
teams? When we try to predict the validity of scores for Factors I 
and I, should we consider that we have six tests or only three? In 


“actor I we actually have two forms of a scale measuring attitude 


toward God, two forms of a scale measuring attitude toward evolu- 
i a scale measuring attitude toward birth 
control, Fundamentally, we have only three attitude variables, so 
Our question is whether or not we should derive the validity estimates 
Tom three tests rather than from six tests? And, if this is the case, 
should we compute these validities from the factor loadings we have 
already presented? Or should they be based on factor loadings 


tion, and two forms of 
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derived from new matrices of the intercorrelations among three tests 
rather than among six tests? This same reasoning holds for Factors 
II and III, but in the last instance we must remember that the 
choice will be between four and eight tests rather than between 
three and six tests. 

To answer the questions we have just raised, the original matrices 
were subdivided so that the correlations based on Forms A of all 
scales and those based on Forms B of all scales could be separately 
analyzed. We find the same pattern, almost the same factor loadings; 
and the same communalities in these new analyses as we found in the 
original analyses based upon the larger matrices. The mean difference 
between factor loadings determined from the two sets of reduced 
matrices is only .06. That between the new loadings and those 
derived from the original larger matrices is only .03. The average 
difference in communality coefficients between the reduced and the 
original matrices is .04. We can conclude that the inclusion © 
alternate test forms inflated neither the original factor loadings nor 
the communalities. 

The reliabilities of our factor-score estimates vary between ol 
and .96. These estimates were derived from the correlations betwee” 
the factor scores obtained from Forms A of our attitude scales and 
those derived from Forms B of our scales, when both were included 
in the same matrix. After these correlations were obtained, they 
were entered in the Spearman-Brown Prophecy Formula to give us 
our final estimates. 

Selection of Criterion Groups. When factor scores have become 
available, we can select our criterion groups. We need two groups forg 
each factor variable: one with high factor scores and one with low 
factor scores. The proportion of the total subject population to be 
included in each criterion group can be set at the upper and lowe" 
thirds, the upper and lower 25 per cent, the upper and lower 27 pe 
cent, or at any other desired figure. In the present case, Fergus” 
followed Truman L. Kelley’s recommendation to use the upper a? 
lower 27 per cent. Ferguson did this because of Kelley’s contention 
that the use of the upper and lower 27 per cent offers the best co™- 
promise between the two variables which can affect the reliability 
of our results. These two variables are the number of cases and the 
scalar distance between the two groups. These variables are i 
versely related. As we increase the number of cases, the scalar diS- 
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tance contracts. And as we increase the scalar distance, the number 
of cases diminishes. Kelley suggests that the point of maximum 
reliability is achieved when we dichotomize at 27 per cent. 

Item Analysis. By item analysis we mean, of course, any of the 
many processes by which we can find which items differentiate and 
which items do not differentiate between our contrasting criterion 
groups. If we decide to use the upper and lower thirds of our subject 
population as criterion groups, we can compute for these groups the 
critical ratios of the differences between the percentages agreeing 
with each of our attitude statements. If we decide to use the upper 
and lower 25 per cent of our subject population as criterion groups, 
we can use Guilford’s abac (see page 297 of his text Fundamental 
Statistics in Psychology and Education) for determining item signif- 
Icance. In the study under review, Ferguson chose to use the upper 
and lower 27 per cent of the subject population as criterion groups. 
This made it possible for him to use Flanagan’s item-weighting chart 
to determine the correlation of each item with the factor scores. To 


use Flanagan’s chart we compute for each criterion group the per- 
itude statements. We enter 


centage agreeing with each of our att 
ad off the correlation 


these percentages in Flanagan’s chart and re 
of the item response with the factor variable. 
We have already indicated that the number of items to be in- 
cluded in a scale must be decided in an arbitrary manner. In the 
Original editions of the primary social attitude scales, Ferguson used 
38 items for Factors I and II and 64 items (32 items in each of two 
alternate forms) for Factor III. In the revised editions Ferguson 
Used a uniform number of 25 items in each form of each scale. In 
Selecting these items, he excluded from consideration items en- 
dorsed by more than 90 per cent of the subjects or by less than 10 
Per cent of the subjects in either of the two criterion groups. Of the 
items remaining, those with the highest correlations with the factor 
Scores were retained to comprise the final scales. 
Assignment of Item Weights. Following our selection of items, we 
are faced with the problem of assigning numerical scoring weights. 
€ can base these weights upon our 4 priori judgment, upon any 
of the numerous item-analysis charts or abacs, or upon biserial 
Correlations showing how each item correlates with the factor in- 
volved. In the present instance, weights were based on the correla- 
tions derived from: Flanagan’s chart. The correlation for each item 
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was rounded to the nearest one-digit number, the decimal was dis- 
regarded, and the result became the scoring weight. te 
The proper procedure to follow will depend upon the setting © f 
study, its purposes, goals, and so forth. A useful rule to keep in a ; 
however, is that of separating the item-weighting process from a 
process involved in determining item significance. If we de es 
separate these two processes, we are likely to fall into the error 
assigning differential weights upon some nonrelevant basis. 
We suggest avoiding the use of critical ratios, a frequently vee 
criterion, as the basis for item weighting. In the first place, a critica 
ratio indicates nothing more than the probability that any difference 
at hand can be ascribed to non-chance rather than to chance factors. 
It does not indicate an amount, or degree, or a magnitude of zelas 
tionship. A second reason for avoiding the use of critical ratios as # 
basis for item weighting is that its value is so dependent upon the 
number of cases. It is true that the more cases available, the greater 
the confidence we can have in our results. But this greater confidence 
does not mean that the relationship under consideration has 1n- 
creased in magnitude. We must be careful, therefore, not to let an 
increase in the number of cases, and its consequent augmentation 
of a critical ratio, deceive us into believing that a greater relationship 
is indicated and that a higher scoring weight is required. This leads 
us to the conclusion that we should use as a basis for item weighting 
some statistic, the numerical magnitude of which is not affected by 
the number of cases we need in our criterion groups. , 
Determination of General Attitude Scores. Our next step is to arrive 
at attitude scores derived from the items we have selected for out 
scales. This can be a simple process, and, in the present instances 
involves nothing more than an algebraic summation of the numerica 
weights associated with the statements which each subject endorses: 
We have six scores to derive for each subject. These are his scores 0” 
Forms A and B of each of our three attitude v 


ariables. 
Preparation of 


Norms. Our preparation of norms can proceed in 
the same way as for any other psychological measuring devic® 
The steps in this process are the giving of the completed scales to # 
standardization group, the preparation of raw-score and standart- 
score distributions, and the tabulation of percentile norms. 
Reliability. The reliabilities of the scores on the scales we havê 
Just discussed are presented in Table 43. The values for the com 
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Tase 43. Reliability Data for the Primary Attitude Scales* 


Factor l: Religionism. ...-..---- zon OS 
Forti Ais ncsc5ticrca tenes oh eT aR 90 
Form Bisssoscis ó wepe ae 

Factor I]: Humanitarianism ons. we 
Form A -85 
Form B...... 85 

Factor II: Natio 88 
Form A 78 
Form B.. 78 


* From Ferguson, 1.. W. A revision of the primary social attitude scales. 7. Psychol., 1944, 
17, 229-241. 


posite scores were determined by correlating the scores on Form A 


with those on Form B and by entering the result in the Spearman- 
The reliabilities for the alternate forms 


Brown Prophecy Formula. 
on the alternate forms 


were determined by correlating the scores 
with each other. 

Validity. Our usual definition of a valid test is that it is a test 
which measures that which it is supposed to measure. In the case of 
the general attitude scales we have just discussed, they are supposed 
to measure the factors which we isolated in our factor analyses. Our 
scales will be valid scales if they correlate highly with our factor 
Scores. And we find that they do. Scores on primary attitude scale 
I correlate .92 with scores on Factor I. Scores on primary attitude 
scale II correlate .92 with scores on Factor H. And scores on primary 
attitude scale III correlate .90 with scores on Factor III. 

These validity coefficients are fine, as far as they go. They show 
that we can take a set of 25 or 50 statements and manipulate them 
together in some way to give us a score that is much the same as we 
get when we manipulate these same items some other way. Actually, 
these validity coefficients Jeg the real question. This question is, 


te ; i i 
How valid are our scales in measuring the real attitudes of people?” 


oes primary attitude scale I accurately measure religious attitudes? 

oes primary attitude scale II accurately measure humanitarian 
(or aggressive) attitudes? And does primary attitude scale II] 
accurately measure nationalistic attitudes? Two attempts have been 
made to answer these questions. One attempt has reference to the 
teligionism scale, and the other has reference to the nationalism 
Scale. The less complet of these two attempts is that relating to the 


nationalism scale, so we shall discuss it first. 
This study consisted in a straightforward attempt to relate the 
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44. Mean Nationalism Scores for Students Giving Nationalistic and 
Internationalistic Answers to the Items in the SPSSI Survey on Methods of 
Preventing War* 


Mean nation- 
alism scores 


Yes 


35 


30 


30 


40 


39 


14 


25 


37 


34 


29 


24 


No 


l4 


40 


34 


28 


Be 


34 


16 


40 


21 


4] 


We, the people of the 


Yes 


Yes 


Yes 


No 


No 


wn 


10. 


11. 


Ws 


United States, in order to preserve peace should: 


. Build up our military strength, on land and sea and in 


the air, so that no nation or combination of nations 
would dare to attack us. 


. Join with other peace-loving nations in economic and 


other non-military measures to prevent further at- 


tacks by any country. 


. Stop giving military protection to our citizens, or their 


trade or their property, in other parts of the world. 


. Establish higher protective tariffs, so as to build up 


American industry to a point of self-sufficiency where 
it will be independent of the entanglements resulting 
from foreign trade. 


. Educate the American people to realize that strong 


labor unions are a great bulwark against war and 
fascist tendencies in the United States. 


. Reduce our naval and air strength until it is only 


strong enough to defend our own shores and Hawai 
not the Philippines, nor our trade and investments 1” 
the Far East. 


. Educate American children in the fundamentals of 


ha A A aa E 
patriotism, making sure they realize that America 


always stood for peace and justice among the nation’: 


. Take the lead in reducing tariffs, with reciprocal tra He 


treaties wherever possible, so as to lower the economic 


barriers which now separate nation from nation. 


. Oppose socialism, communism and other alien philoso- 


phies which threaten to make America more like the 
war-making European dictatorships. 

Make it perfectly clear that America is ready t° ae 
fend herself—that anyone who attacks our honor s 
our vital interests must count on fighting it to & inen 
Educate children to be international-minded—to SUP- 
port any movement which contributes to the welfare 
of the world as a whole, regardless of special nation® 
interests. 

Establish social ownership of industry, in order s 
eliminate the autocratic power of the big busines? 
men who now profit from war, or from the policies 
which lead to war. $ 


—_—— 
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Taste 44. Mean Nationalism Scores for Students Giving Nationalistic and 
Internationalistic Answers to the Items in the SPSSI Survey on Methods of 
Preventing War* (Continued) 


Mean nation- 


alism scores 
We, the people of the United States, in order to preserve peace should: 


Yes No 


26 36 | Yes |? | Not |13. Help establish a “United States of the World” in 
which America would be about as independent as the 
state of Illinois is now in the U.S.A. 

34 28 | Yes | z| No |14- Keep out of alliances now but support a League of 
Nations, with strong armed forces of its own, as soon 
as there is any prospect of establishing it on a world- 
wide basis. 

35 23 | Yes |? | No | 15. Permanently keep away from entangling alliances 
which might limit our national freedom of action or 


involve us in the quarrels of other nations. 


* Data from Ferguson, L. W. The isolation and measurement of nationalism. J. soc. Psychol., 


1942, 16, 215-228. 

tNationalistic answer is italicized. 
scores on the nationalism scale to a second “recognized” measure 
of nationalistic attitudes. This second measure is contained in the 
“Survey of Opinion on Methods of Preventing War,” a survey 
Prepared by the Committee on War and Peace of the Society for 
the Psychological Study of Social Issues (SPSSI). The items in the 
SPSSI survey cover attitudes toward patriotism and international- 
ism, national honor and anti-imperialism, tariffs, militarism, social- 
ism and communism, and international relations. The statements 
were not originally presented as a scale, but the SPSSI Committee 
has indicated which answers it considers nationalistic and which 
answers it considers internationalistic. We have in this survey a 
Consensus of expert psychological opinion as to what nationalistic 
and internationalistic attitudes are supposed to be. 

A group of 158 students at the University of Connecticut were 
asked to take both the SPSSI survey and the nationalism scale. On 


each question in the SPSSI survey these 158 students were divided 
into those who gave the nationalistic answers and those who gave 
the internationalistic answers. Then for each of these groups mean 
Scores on nationalism were determined. The results are given in 


Table 44, Without exception the group giving the nationalistic 
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answers on the SPSSI survey receives a higher mean score on na- 
tionalism than does the group giving the internationalistic answer. 

The second study on validation concerns the scores on the religion- 
ism scale. This study represents a considerably more complex n 
proach than the one we have just considered, so our discussion e 
have to be more detailed. The basic principle, however, 1s simpic 
enough. This was to secure groups of subjects whose attitudes could 
be known without reference to their scores on the religionism, 
humanitarianism, and nationalism scales; then to compare these 
groups on the attitude scales and see if they could be differentiated 
from one another. 

The groups selected for study were 46 Catholics, 91 Protestants, 
and 33 Jews. The subjects in each of these groups expressed @ 
definite preference for, or claimed membership in, one or the other 
of these religious faiths. Therefore, if the subjects can be considered 
typical representatives of their respective faiths, we can say that 
certain of their religious attitudes are known. It should follow that 
any scale purporting to be a measure of religious attitudes should 
reveal important differences among these three groups. If the reli- 
gionism scale is valid to any substantial degree, it should yield 
larger differences than should the scales on humanitarianism and 
nationalism. And we should find the most pronounced differences 
between Catholics and Jews. Our data, as we shall see, fulfill these 
two expectations completely. 5 

The first step in this validation study consisted of the construction 
of nine ideological scoring keys. These keys, three for each primary 
attitude variable, were based upon the differences in responses 
between Catholics and Jews, between Catholics and Protestants, and 
between Protestants and Jews. 

Diagnostic items were found by dividing criterion group differ- 
ences by 10 and by taking as the scoring weights the unit figures 
which most nearly approximated these quotients. For the Catholic- 
Jewish continuum, item 1 of the religionism scale was assigned 4 
weight of —1 (67 per cent, the figure for Catholics, minus 76 pe" 
cent, the figure for Jews, equals —9. When this is divided by 10, 1° 
gives —.9 which is very nearly —1.). When item weights had been 
determined, they were used in scoring the attitudes of 100 subjects 
not included in the criterion groups. We secured a Catholic-Jewis" 
score, a Catholic-Protestant score, and a Protestant-Jewish score- 
The reliabilities of these scores are reported in Table 45, The reli- 
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Tase 45. Reliabilities of the “ Religious” Scores* 


Scale I Il Ill Average 
Catholic-Jewish continuum....-.--++++ -867 NIS -708 78 
Catholic-Protestant continuum. ...-.-- -796 .805 -608 i 
Protestant-Jewish continuum „883 410 | -346 |_ -35 


AVERE i ia neces erence ne 


* From Ferguson, L. W. The sociological validity of primary social attitude scale No. I; 
Religionism. 7. soc. Psychol., 1946, 23, 197-204. 


abilities of the scores based upon the items in the religionism scale 
are higher than those of the scores based upon the items in the 
humanitarianism and nationalism scales. Also, Catholic-Jewish 
scores are more reliable than Catholic-Protestant scores or Protes- 
tant-Jewish scores. The average of the reliabilities for religionism 
is .85. The average for humanitarianism is .66, and the average for 
nationalism is .55. 

The three sets of Catholic-Jewish scores possess an average re- 
liability of .78; the three sets of Catholic-Protestant scores possess 
an average reliability of .74; and the three sets of Protestant-Jewish 
Scores possess an average reliability of .55. Three of the coefficients 
presented in Table 45 are so low that little confidence can be placed 
in the accuracy of individual scores. But the items on the religionism 
scale yield scores of high reliability. 

Correlations among the Catholic-Protestant, Catholic-Jewish, and 
Protestant-Jewish scores are presented in Table 46. These correla- 


Tape 46. Correlations among the “ Religious” Scores* 


Scales intercorrelated I | H III | Average 
Catholic_Jewish scores vs. Catholic-Protestant scores....- 970 | .828 88 
a N A 95 p à 
Catholic-Jewish scores vs. Protestan Jewish scores.. ---- Raz 679 73 
Catholic-Protestant scores vs. Prote: t-Jewish scores 873 | _.203 38 
65 


NENG ote cic a nai eE UI ea 93 | -60 


“From Ferguson, L. W. The sociological validity of primary social attitude scale No. I: 


Religionism. F. soc. Psychol., 1946, 23, 197-204. 

tions decrease from an aver 93 for religionism to .60 for 
humanitarianism and to -38 for nationalism. On religionism Catholics 
Secure the highest scores and Jews the lowest. The linear arrange- 


Ment is Catholics — Protestants > Jews. 


age of . 
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The reverse arrangement holds true for scores on the humanitar- 
ianism scale. Jews secure higher humanitarian scores than Protes- 
tants, and Protestants secure higher humanitarian scores than 
Catholics. 

Scores based on the nationalism scale yield the same arrangement 
as those based on the religionism scale. Catholics secure the highest 
scores, and Jews secure the lowest. The linear arrangement 1S 
Catholics > Protestants > Jews. 

How do the sets of ideological difference scores compare with the 
original general attitude scores? The various sets of religious ideo- 
logical difference scores correlate highly with scores on ama, 
their average correlation being .93. They do not correlate so we 
with the other scales, however. The average correlation drops tO 
.76 for humanitarianism and to .67 for nationalism. These correla- 
tions are sufficiently large, however, to suggest that ideological 
differences other than those of a strictly religious nature serve tO 
differentiate religious groups from each other. Among other things, 
the data indicate that Jews are more humanitarian than Protestants, 
that Protestants are more humanitarian than Catholics, that Protes- 
tants are more nationalistic than Catholics, and that Catholics are 
more nationalistic than Jews. : 

The point of paramount importance in this study is the high 
correlation between the ideological difference scores and scores 07 
religionism. The item weights for the two scoring systems were 
derived independently, in markedly different ways, and for different 
subject populations. Therefore, the correlation between them 1$ 
authentic and psychologically significant. And this significance 1° 
augmented by the fact that the ideological scales based upon the 
items in the humanitarianism and nationalism scales are not highly 
related to those based upon the items in the religionism scale. 

Our concern in this chapter has been to outline the steps in the 
development of the Thurstone equal-appearing-interval technique ° 
attitude-scale construction, to discuss some of the problems en- 
countered in the application of this technique, and to show two ways 
in which attempts have been made to seek greater generality in 1t 
usage. We shall now leave the Thurstone technique of equal-appea~ 
ing intervals and shall turn our attention in the next chapter tO eo 


markedly different approach to the problems involved in attitud® 
measurement. 


5 


ATTITUDES: AN A POSTERIORI APPROACH 


We are to examine in this chapter a second major technique for the 
measurement of attitudes. This is the method of summated ratings, 
developed by Rensis Likert. In the last chapter we called this method 
an a posteriori approach, because, in contrast to the Thurstone 
technique of equal-appearing intervals, scale values are determined 
after, rather than before, the collection of attitude data. 
Likert first described the method of summated ratings in a mono- 
graph published in 1932. In this monograph, “A Technique for the 
Measurement of Attitudes,” Likert describes the development of 
scales to measure attitudes toward internationalism, imperialism, 
and the Negro. The first two of these scales are now out of date, but 
the third scale, attitude toward the Negro, can still be used. 
A second major report on the use of the method of summated 
ratings is contained in the monograph “Personality in the Depres- 
sion.” This monograph was prepared by Rundquist and Sletto and 
was published in 1936. In this monograph Rundquist and Sletto 
show how they applied Likert’s technique in the development of 
Scales to measure general morale, inferiority, family relations, respect 
for law, economic conservatism, and attitude toward the value of 
education. A description of the methods used in the development of 
_these scales will serve two purposes. It will illustrate, better than 

some of Likert’s own data, the method of summated ratings. And it 
understand the development of one of 
hat we plan to discuss in Chap. 7. 

The basic assumptions in the method of summated ratings are 
that each statement in our scale covers the entire attitude con- 
tinuum; that specific points on this scale can be indicated by alterna- 
tive responses to each statement; that the points to be represented 
by the alternative responses can be determined from a knowledge 
of the percentage of subjects who give each of these responses; and 
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will put us in a position to 
the personality inventories t 
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that an individual’s attitude can be determined from a summation 
of his responses to all statements in the scale. This final score is to be 
viewed as a kind of average estimate which we get from the applica- 
tion of a number of different yardsticks (the different statements); 
each one of which extends the whole length of the attitude con- 
tinuum. This is in marked contrast to the Thurstone concept that 
each statement in an equal-appearing-interval scale represents only 
a specific part of the attitude continuum. 


DEVELOPMENTAL STEPS 


The preparation of an attitude scale in accord with the Likert 
technique proceeds through the following steps: 


1. A list of statements is collected. 
2. These statements are edited. 


3. The edited statements, constituting the attitude scale, are given to the 
individuals whose attitudes are to be measured. 


4. The number and percentage of subjects giving the alternative responses tO 
each statement are determined. 


5. Scoring weights for the alternative responses to each statement are deter- 
mined. 


6. The scale is scored. 

7. Item-consistency data are secured. 

8. Inconsistent items, if any, are eliminated. 
9. The revised scale is rescored. 
10. Norms are prepared. 


We shall discuss each of these steps in some detail. In connection 
with each one we shall outline the problems involved and shall 
explain various solutions which are available. 

Collection of Statements. These statements may be made up by 
the investigator, they may be taken from preexisting scales, they 
may be culled from newspapers and periodicals, or they may be 
selected from comments and talks of appropriate authorities. Each 
statement should be one to which individuals having different 
attitudes will, if given a chance, respond differently. Therefore, W 
must avoid statements of fact, ambiguous statements, double- 
barreled statements, statements having several parts, and state 
ments reflecting more than one attitude variable. 

Editing of Statements. Our statements must be edited to ensure 
a terminology consistent with the purposes to be served by the 
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completed scale and to assure their appropriateness for the alterna- 
tive responses we decide to allow. These alternative responses can 


her “Yes” CE “INOS “strongly approve,” “approve,” ““un- 
decided,” “disapprove,” “strongly disapprove’; or “strongly 


agree,” “agree,” “undecided,” “disagree,” “strongly disagree.” 
Of course, other wordings and other numbers of alternatives are 
possible. We have given here those used by Likert, and by Rundquist 
and Sletto. 

When our list of statements is complete, we must give it a pre- 
liminary tryout. This will elicit comments and queries about am- 
biguities and obscurities that we had not previously detected and 
will enable us to make the necessary corrections before we prepare 
the final form of our scale. In the Likert-type attitude scale the 
editing of statements is a much more important step than it is 
in a Thurstone-type scale. The reason for this is that there is no 
such objective check upon ambiguity in the Likert method as there 
is in the Thurstone procedure. Therefore, there is more opportunity 
for an ambiguous statement to remain undetected in a Likert-type 
scale than there is in a Thurstone-type scale. This being the case, we 
must exercise all the care we, can during the editing process. 

There is no set number of statements to be collected, nor any 
fixed number of statements to be included, in a Likert-type attitude 
scale. Likert used 24 statements in his internationalism scale, 12 
items in his imperialism scale, and 15 items in his Negro scale. Rund- 
quist and Sletto used a uniform number of 22 statements in each 
of their six scales. This number can vary with the investigator and 
with his ability to prepare useful and significant statements. 

Preliminary Tryout. Likert’s scale for measuring attitude to- 
ward the Negro is presented in Table 47, and one of the six scales 
prepared by Rundquist and Sletto is given 1n Table 48. When these 
scales were first used, the statements were not segregated, however, 
as they are in these tables. Likert’s statements on the Negro were 
included in an omnibus “ Survey” along with his other statements on 
imperialism and internationalism, and the statements in Rundquist 
and Sletto’s scales were also “scrambled.” No one scale stood out as 
a separate unit. : . 

The desirability of isolating statements or of including them un- 
identified among statements pertaining to other attitude variables 
depends, to a large extent, upon the purposes of the investigator 
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; jj gi 
TaBLE 47. Statements in Likert’s Scale Measuring Attitude toward the Negro 


DN a EE 


i i i i ve ing, a isagree- 
- Would most negroes, if not held in their place, become officious, overbearing, and disag 


able ?{ 


If you went into a cafeteria in a northern city, sat down, and then realized you were at 


the table with a negro, would you leave the table? 


- Would you shake hands with a negro? 
- Do you disapprove of the use of the term “nigger”? 


If you heard of a negro who had bought a home or a farm would you be glad? " 

In a community in which the negroes outnumber the whites, under what circumstances 1S 

the lynching of a negro justifiable ? 

a. Never. 

å. In very exceptional cases where a specially brutal crime against a white person calls for 
swift punishment. 

c. As punishment for any brutal crime against a white person. 


` d. As punishment for any gross offense (felony or extreme insolence) committed agains 


9. 
10. 
ain 
12. 


13, 
14. 


15. 


a white person. 
e. As punishment for any act of insolence against a white person. A 
How far in our educational system (aside from trade education) should the most intelligent 
negroes be allowed to go? 
a. Grade school. 
b. Junior high school. 
c. High school. 
d. College. 
e. Graduate and professional school. 


. x 6 s a 
. Ina community where the negroes outnumber the whites, a negro who is insolent to 


white man should be: 
a. Excused or ignored. 
b. Reprimanded. 

c. Fined and jailed. 

d. Not only fined and jailed, but also given corporal punishment (whipping, etc.)- 
e. Lynched. 

All negroes belong in one class and should be tre: 
Negro homes should be segregated from those of 
Where there is segregation, the negro section sho 
water, and electric light facilities as are found in 
If the same preparation is required, the negro te 
white. 

Practically all American hotel 
No negro should be deprived 
franchise a white man. 

In a community of 1,000 whites and 50 negroes 


who is trying to arrest him. THE WHITE p 
NEGROES OUT OF TOWN, 


ated in about the same way.t 

white people, 4 
uld have the same equipment in paving» 
the white districts, 

acher should receive the same salary as the 


s should refuse to admit negroes, i 
: is- 
of the franchise except for reasons which would also d 
7 cet 
» a drunken negro shoots and kills an a 
OPULATION IMMEDIATELY DRIVE ALL T 


S R. A technique for the measurement of attitude, Arch, Psychol., 1932s ne 
140. 


f Items 1 to 5 to be answered: Yes, ?, No. 


} Items 9 to 15 to be answered: strongly approve, approve, undecided disapprove, strongly 
disapprove, 
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Tase 48. Statements in the Rundquist-Sletto Morale Scale* 


1. The future is too uncertain for a person to plan on marrying. 
2. It is difficult to think clearly these days. 
3. The future looks very black. 
4. Life is just one worry after another. 
5. Most people can be trusted. 
6. Times are getting better. 
7. It does not take long to get over feeling gloomy. 
8. The day is not long enough to do one’s work well and have any time for fun. 
9. No one cares much what happens to you. 
10. Any man with ability and willingness to work hard has a good chance of being successful. 
11. It is great to be living in these exciting times. 
12. These days one is inclined to give up hope of amounting to something. 
13. There is little chance for advancement in industry and business unless a man has unfair 
pull. 
14. The young man of today can expect much of the future. f 
15. This generation will probably never see such hard times again. 
16. Real friends are as easy to find as ever. 
17. Life is just a series of disappointments. 
18. One seldom worries so much as to become very miserable.. 
19. A man does not have to pretend he is smarter than he really is to “get by.” 
20. Success is more dependent on luck than on real ability. , . 
21L. A person can plan his future so that everything will come out all right in the long run. 
22. There is really no point in living. 


* From Rundquist, E. A., and Sletto, R. F. Personality in the Depression, Minneapolis, 
s » E. Ag 
linn.: University of Minnesota Press, 1936. 


and upon the need which exists for the utilization of one scale 
Separately from the others with which it can be combined. Neither 
In Likert’s study nor in Rundquist and Sletto’s was there any need 
to have the statements of the different scales isolated from each 
other. It was advantageous, therefore, for these investigators to 
include their statements in lengthy omnibus lists. In this way some 
of the halo that might have accrued to statements in isolation may 

€ assumed to have been effectively curtailed. 

Number and Percentage of Subjects Giving Alternate Re- 
 Sponses. The procedure we must follow here is illustrated in Table 
49. This table shows the number and percentage of subjects giving 
cach of the alternative responses to the statement “Most people 
can be trusted,” one of the items in the Rundquist-Sletto morale 
Scale. The figures in column 1 show that 22 subjects strongly agree 
with this statement; that 200 subjects agree with this statement; 
that 79 subjects are undecided; that 144 subjects disagree with this 
Statement; and that 55 subjects strongly disagree with this state- 
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Tase 49. Number and Percentage of Subjects Giving Different Answers to the Item 
“ Most people can be trusted”™ 


Response | Number | Percentage 
| 
Strongly agree.......-..--- 22 | 4.40 
AEs oinpean 200 | 40.00 
Undecided OPE SPS 79 | 15.80 
Disagree......... 144 | 28.80 
Strongly disagree. ai 55 |__ 11.00 
Dais as sa 1 wares | 500 100.00 


* From Rundquist, E. A., and Sletto, R. F. Personality in the Depression. Minneapolis, 
Minn.: University of Minnesota Press, 1936. 


ment. In column 2 these figures have been converted into percent- 
ages. It is necessary for us to obtain similar data for every statement 
in our scale. 

Determination of Scoring Weights. We can determine scoring 
values in any one of three different ways. Two of these ways were 
suggested by Likert. The third was suggested by Rundquist and 
Sletto. We can use an arbitrary weighting method, a standard-score 
weighting method, or a sigma-deviate weighting method. The last 
of these methods is the only one which meets the theoretical pre- 
conceptions of the Likert technique, so we shall describe it first. 

Sigma-deviate Weighting Method. Likert assumes, as does any 
investigator who uses the sigma-deviate scoring method, that atti- 
tudes are distributed normally. If an investigator were omniscient, 
he could word his statements and their alternate responses in such 
a way that only normal distributions would be obtained. The dis- 
tributions we shall obtain will depart from this ideal because our 
statements are faulty, because the alternative responses are not 
psychologically equidistant from each other, or because modal 
responses depart from intended points of neutrality. 

Our fundamental problem is that of determining the position on 
the attitude continuum that each alternative response can be said 
to represent. And because the distributions for the different state- 
ments in our scale will differ from each other, we have to determine 
these values for each statement. Likert does not explain as clearly 
as he might the method by which these scale positions are deter- 
mined. Guilford has come to our rescue, however, and in his text 
Fundamental Statistics in Psychology and Education has given a 
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complete explanation of the manner in which these scale positions 
can be determined. We shall follow closely Guilford’s explanation, 
but we shall use Likert’s data so we can compare our results with 
those which Likert presents. 


Tapie 50. Calculations Involved in the Sigma-devicte Weighting Method* 


ag g va 
| SS Bh Eee | See] she 

a lew afe ge FE] 22/4 

Zelia J2 S|2S/ES| et |2 

22 jag m2 5| 5216s a = 
wan, S Pes ack Boe P| 22) eet as g g 
Se laS ISS elga] gal T8 | = a 
Response 5 g 58 2 EEES $ 9 S 8 £ 3 2 3 
EmlEpslErg|ee/28) Be] ee ]s 

BE/RSa/e5S)58\5e2| ég| BB 
fm |P me a e a | a 5 
(1) (2) (3) (4) | (5) (6) (7) | (8) 
| 
| 
Strongly approve. .....- 3 87 1.00 | .211 | .000 | -211 1.63 | 34 
| 43 394 | .2 33 43 | 22 
Approve,.,.... .| 43 44 .87 .394 | .211 .183 b 2 
Undecided... 2 "93 | 44 | 304 | .394 | —.090 | — -43 | 13 
Disapprove........+-+++ 13 10 .23 | .176 304 | —.128 | = .99| 8 
Sronalrtiagpave.....| 10 | 08 | JO | O00 | aes | de) LT 8 
| 


* Data from Likert, R. A technique for the measurement of attitude. Arch. Psychol., 1932, 


No. 140. 


Column 1 of Table 50 shows the proportion of subjects who give 
each one of the alternate answers to one of the statements ın Likert’s 
scale on internationalism. We begin with these proportions, and, 


using them as a basis for our calculations, we proceed as follows. 


In column 2 we list opposite each indicated degree of agreement 
the proportion of cases giving answers indicating a lesser degree of 
agreement. We write the proportion .00 opposite “strongly dis- 
approve,” because there is no response indicating less agreement. 
We write the proportion .10 opposite “disapprove,” because this 
proportion of subjects answered “strongly disapprove,” and this 
response indicates less agreement than the response “disapprove. 

In column 3 we list opposite each indicated degree of agreement 
the proportion of cases indicating the same or a lesser degree of 
agreement. We write the proportion -10 opposite strongly. dis- 
approve,” because this proportion of subjects indicate this degree of 
agreement (7.e., “strongly disapprove”) or a lesser degree of agree- 
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ment. We write the proportion .23 opposite “disapprove,” because 
this proportion of subjects indicate this degree of agreement (7.e., 
“‘disapprove”’) or a lesser degree of agreement (i.¢., “strongly dis- 
approve”). The proportion to be entered opposite “strongly 
approve” will always be 1.00, because all subjects will indicate a 
complete or a lesser degree of agreement with a statement. 

Our next step is to determine the value of the ordinates ċorre- 
sponding to the proportions we have entered in columns 2 and 3. We 
determine these values by looking them up in any one of the numer- 
ous tables giving the ordinates corresponding to the proportions of 


the area under a normal distribution curve. We list these ordinates s 


in columns 4 and 5. We subtract the entries in column 5 from the 
corresponding entries in column 4 and enter the differences in 
column 6. 

Last, we divide the differences in column 6 by the percentages in 
column 1. The quotients we list in column 7. These values correspond 
to those given by Likert in his monograph. They show the position 
of each alternate response on the underlying attitude continuum. 
Our purpose in getting these values is that we want to use them as 
our scoring weights. But now that we have them, we can see that 
they would prove most inconvenient. Who wants to work with 
decimals and with minus numbers in scoring? 

We can easily eliminate the negative numbers. All we need to do is 
add a constant of 1.76 to each of the values entered in column 7. 
This makes all entries 0 or positive. And we can eliminate the 


decimals by multiplying each of these figures by 10 and by rounding 
the result to the nearest whole numb 


the values in column 8. These values p 
ships we sought to obtain and are obv 
for scoring. 

We have illustrated the method of deriving scoring weights for 
just one statement. We must remember tha 
process for all statements in our scale. We 
of the constant only once, however. We can 
value to the largest negative sigma deviate. Tt should not be com- 
puted separately for each item, for if we did, we would destroy its 
relative position on the underlying attitude continuum. 

Arbitrary Weighting Method. Likert noted, after the computation 
of a number of sigma-deviate values, that they did not differ very 


er. These operations give us 
reserve the essential relation- 
iously much more convenient 


t we have to repeat the 
have to derive the value 
make it equal in absolute 


—— 
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much from the relations involved in the simple series of arbitrary 
weights 1, 2, 3, 4, 5. He suggests, therefore, that these values can 
be assigned as scoring weights for the alternate responses to each 
statement. Strangely enough, Likert found that scores resulting 
from these weights correlate .99 with scores based upon the sigma- 
deviate method. : : 

Standard-score Weighting Method. Rundquist and Sletto describe a 
third way in which item-scoring weights can be derived. They sug- 
gest that scoring values can be based upon the extent to which 
response values in the arbitrary scoring system deviate, in standard- 
score units, from the mean rating on the item. This procedure re- 
quires the assignment of an arbitrary system of weights, e.g., 1 to 
5; a count of the number of times each alternate response is given; 


‘the determination of the mean rating assigned; the calculation of the 


standard deviation of the distribution of ratings around this mean 
value; and the determination, in sigma-score units, of the difference 
between the response values 1, 2, 3, 4, and 5 and the mean rating. 
These differences, or some linear function of them, become our scor- 
ing weights. The calculations involved in this procedure are illus- 


trated in Table 51. 
Tague 51. Calculations Involved in the Standard-score Weighting Method* 


| 
x—m x— mM 
Response £ f Je Je e (==) +e 
Strongly approve. ...---- +++ +++ 5 13 65 325 1.41 3.44 
PPTOVE ras a ce eo me ow Ooh 4 43 172 | 688 255) 2.58 
Undecided. . . 3 21 63 | 189 — 31 1:72 
Disapprove 2| 1 26: | S2 | =k .86 
Strongly disapprove 1 | 10 10 | 1 | —2.03 0.00 
Mean = 3.36. Standard deviation = 1.16 


Parameters, o oo eee e eeaeee 


* From Rundquist, E. A., and Sletto, R. F. Personality in the Depression. Minneapolis, 


Minn.: University of Minnesota Press, 1936. 

this procedure accomplishes nothing beyond that 
em of arbitrary weights cannot itself accom- 
ores derived are simple, direct, and perfect 
he original arbitrary weights. Therefore, 
to be used, there is no point in 


Unfortunately, 
which the original syst 
plish. The standard sc 
linear transformations of t 1 
if an arbitrary system of weights is 
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transforming them into standard scores. We may as well save our- 
selves the computational labor and use the arbitrary scores directly. 

Scoring of Scale. We use as our scoring weights those derived by 
any one of the three methods we have just described. To get an 
attitude score for one subject, we add together the numerical values 
associated with his responses. ; 

We have already indicated that the sigma-deviate method of 
weighting is the only method which meets the theoretical preconcep- 
tions involved in the method of summated ratings. It is the only 
method which meets these preconceptions, because it is the only 
method which utilizes the responses of the subjects as a basis for 
the determination of the scoring weights. Therefore, if we use the 
arbitrary weighting method or the standard-score method, we are 
not, in reality, making use of the method of summated ratings. 

Item Analysis. We can secure item-consistency data in several 
ways. We can compare the mean item scores of high- and low-scoring 
subjects or compute the correlation between each item and total 
scores, or we can go through either one of these processes with 
subjects scoring high and low upon the basis of an independent 
criterion. If we choose to compute the correlation between each item 
and total scores, we can use tetrachoric, biserial, contingency, or 
Pearsonian coefficients. ý 

If we wish to compute tetrachoric correlations, we divide our 

cases into above-average and below-average groups, and, at the 
same time, into “favorable” answer and “unfavorable” answer 
groups. When we have done this, we convert our figures into per- 
centages, and, utilizing the Chesire, Thurstone, and Saffir Comput- 
ing Diagrams, we determine the value of our coefficient. 
If we wish to compute a biserial coefficient, we divide our subjects 
into favorable and “unfavorable” answer groups and, at the 
same time, into 10 or 15 groups upon the basis of their total scores. 
We compute the standard deviation of the distribution of total 
scores; the mean score for those who answered favorably and the 
mean score for those who answered unfavorably; the percentage 
of cases who answered favorably and the percentage of cases who 
answered unfavorably; and the value of the ordinate at the point 
of the curve dividing the unfavorable and favorable answer groups. 
We enter these values in the biserial formula to get our coefficient: 

If we compute a Pearsonian coefficient, we can have only five 
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divisions on the “response” axis, but we may have any number we 
wish on the “total score” axis. Usually, however, it is pointless for 
us to have more divisions on this axis than we can have on the 
“response” axis. 
To compute a contingency coefficient, we arrange the data in the 
same manner as we would to compute a tetrachoric coefficient. Then 
we calculate the number of cases which we can expect, by chance, to 
enter each cell of our diagram. We compare these expected per- 
centages with those we obtain and from this comparison compute 
our coefficient of contingency. It shows us the extent to which we 
must allow chance as a causative agent in producing our results. 
The factors determining which of the foregoing coefficients we 
should use are the assumptions we wish to make, the number of cases 
in our subject groups, and our own preferences regarding the com- 


putational details involved. ; ; 
A second method of securing ‘tem-consistency data is that of com- 


paring the average scores of the highest and lowest scoring subjects. 
This is the method used by Likert in his original study and also by 
Rundquist and Sletto in theirs. We can select for this purpose, as we 
indicated in the last chapter, the top and bottom 25 per cent or any 
other convenient proportion. Likert used the upper and lower 10 
per cent in his study, and Rundquist and Sletto used the upper and 
lower 25 per cent in theirs. , f 
When we have decided upon the proportion of cases to assign to 
our high- and low-scoring groups, We can proceed with our com- 
parisons. Rundquist and Sletto illustrate the procedure with a set 
of data for 184 subjects. First they used the Likert arbitrary weight- 
ing system in scoring. Then for each of their six scales they selected 
the 46 highest scoring subjects (the top 25 per cent) and the 46 
lowest scoring subjects (the bottom 25 per cent) as criterion groups. 
They computed, for each of these groups, the numerical values of 
their mean responses tO each statement. These calculations are 
illustrated in Table 52- The difference between = zan response 
a 
for the high-scoring group and the sce. an s ie gone: 
group is used as & measure of item consistency. his difter 
Rundquist and Sletto call the item scale value difference. 
Inconsistent Statements. When our indices of item consistency have 
been computed, we review them for the purpose of a 
giving results inconsistent with total scores on our test. All incon- 
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sistent items, regardless of what other merits they possess, are 
discarded. 

Both the correlational method and the item scale value difference 
method of item analysis enable us to detect and to eliminate items 
which do not give results consistent with total scores. Yet, as we 
have described them, they are both open to question. One question 


Tape 52. Calculations Involved in Computing an Item Scale Value Difference for the 
Item “No one cares much what happens to you”* 


Lowest Highest 


Š 25 per cent | 25 per cent 
Response Weight 


Strongly agree. ........2++-00-+5 1 0 

ARTEA ae r ca iaeia 2 14 | 28 3 6 
Undecided. 3 1 3 3 9 
Disagree.......0.0 bee eeeereeeeeee 4 1 4 8 | 32 
Strongly disagree...........+++5+ 5 0 4 | 20 

n RIE Sawin ae oi 2 18 37 18 | 67 _ 

Gai, tiscereisecie cin sis, assrtearanesntars wit o 2.056 3.722 
Scale value difference.........+4+ of 3.722 — 2.056 = 1.666 


* From Rundquist, E. A., and Sletto, R. F. Personality in the Depression. Minneapolis, 
Minn.: University of Minnesota Press, 1936. 


is raised by the fact that all items in our analyses have themselves 
contributed to the total scores. Therefore any correlation we com- 
pute or any item scale value difference we determine is bound to be 
spuriously high. We could, however, get around this difficulty by 
eliminating each item’s contribution to the total score before com- 
puting our correlations. 

A second question relates to the possible instability of our indices 
of item consistency for different groups of subjects. Rundquist and 
Sletto report two interesting findings in this connection. The first is 
to the effect that the average item scale value difference for the 
items in their scales remained at approximately the same level for 
10 different groups of subjects. The second shows that the rank order 
of the individual items as determined from their consistency values 
changed materially among these same groups. These findings are 
difficult to interpret clearly, however, because of the spurious factor 
we have already mentioned. . 
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j Irrelevant Statements. Inconsistent items are not the only type of 
item we want eliminated from our completed scales. We also want to 
eliminate items that are irrelevant to the continuum which we are 
interested in measuring. In the Likert technique the only index we 
can use for this purpose is our index of item consistency. To what 
extent does this index enable us to eliminate irrelevant as well as 
inconsistent statements? To answer this question Rundquist and 
Sletto computed for the statements in their morale scale their scale- 
value differences in all other scales. Later they did this separately, 
and in turn, for all other scales. They found that items discriminate 
best in their own scales and less well in other scales. Rundquist and 
Sletto took this result to mean that most of the items were placed 
by them in the scale for which they were most highly relevant. We 
must demur. When the scale-value differences were computed: for 
the economic conservatism scale, these selfsame items were instru- 

terion groups. These items were 


mental in the segregation of the cri 
not involved, however, in the segregation of the criterion groups on 


the other scales. If they had been, the results might well have been 


different. 


In our discussion of the ‘Thurstone technique of equal-appearing 


intervals, we mentioned, just briefly, Guttman’s attempt to devise a 
method of measuring item irrelevancy- We can now describe Gutt- 
man’s technique, for it is based directly upon several of the concepts 
involved in Likert’s method of summated ratings. We can best 
explain Guttman’s technique by working through a hypothetical 
problem. Let us suppose that we have tested 100 subjects and have 
found that their answers 0n one of our Likert-type statements are 


distributed as follows: 


Strongly agree 20% 
Agree 20 
Undecided- x 20 
Disagree.. -tt a 20 
Strongly disagree. +- +--+" °° 20 
100% 


ained this distribution is a perfect 
d find that the 20 subjects who 
should have the highest total 
ve the answer “agree” 
dso on. Thus, knowing 


Now if the item on which we obt 


measure of the attitude, We should 
gave the answer “strongly agree 


The 20 subjects who ga 


scores on our scale. 
highest total scores, an 


should have the next 
+ 
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total scores, we should be able to make a perfect prediction of een 
responses. Or, knowing item responses, we should be able to say 
which of our subjects secured the 20 highest scores, which of our 
subjects secured the 20 next highest scores, and so on down the line. 

To dispel the idea that equal numbers of alternative responses are 
necessary, let us look at a second distribution. 


Strongly agree............. 10% 
BOG arp ace. rates 20 
Undecided 40 
TIISABTEC cscs tasaa aa « 20 
Strongly disagree.......... 10 
100%, 


If the item which elicited this distribution is a perfect measure 
of the attitude, we should find that the 10 subjects giving the answer 
“strongly agree” should have the 10 highest total scores. And the 
20 subjects who answer “agree” should have the 20 next-highest 
total scores, and so on. , 

Now let us consider both of our items together. The 10 subjects 
with the highest total scores should be those who respond “strongly 
agree” to both items. The 10 subjects with the 10 next highest scores 
should be those who respond “strongly agree” to item 1 and “ agree 
to item 2. The 10 subjects with the 10 next highest scores should be 
those who respond “agree” to both items. Pursuing this line of 
reasoning for all combinations of responses, we find these combina- 
tions related to total scores as in Table 53. 


Tasie 53. Theoretical Joint Distribution of Two Item Responses 


Rank on Response Response 

total scores to item 1 to item 2 
l- 10 | Strongly agree Strongly agree 

11- 20 Strongly agree Agree 

21- 30 | Agree Agree 

31- 40 | Agree Undecided 

41- 60 | Undecided Undecided 

6l- 70 | Disagree Undecided 

71- 80 Disagree Disagree 

81- 90 Strongly disagree Disagree 

91-100 | Strongly disagree Strongly disagree 


Guttman says that if we can achieve results like these (within 
reasonable limits, of course), the universe of content is scalable. In 
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other words, all items are relevant to the attitude variable in ques- 
tion. When such results are not achieved, Guttman says the universe 
of content is not scalable. In other words, many of the individual 
items are irrelevant to the attitude variable in question. 

Another way of putting Guttman’s results is to say that if there 
is perfect correlation between each item and total scores, the uni- 
verse of content is scalable. Therefore it appears that Guttman has 
done little more than to set up a rather elaborate set of procedures 
to determine the correlation between an item and the total score to 
which it contributes. This result can be obtained just as well by the 
computation of a biserial, tetrachoric, contingency, or Pearsonian 
coefficient. 

If Guttman’s criterion of scalability can be met, why is more than 
a scale? We conjecture that Guttman’s answer 


one item needed in 
t, to counteract 


to this question might be something like this: firs 
random error and second, because one item gives, let us say, only 
5 discriminations. A second item with its 5 responses gives, in com- 
bination with the first item, a possible total of 25 discriminations. It 
will do this, however, only when the cumulative percentages for 


i ith each other. When these percentages 
the two items do not agree with each | perc g 
se, the second item adds no information over 


he first. We would merely have succeeded 
but equally successful, 


agree for every respon 
and above that yielded by the fe 
in asking the same question in two different, 


Ways. : f 
A questionable assumption in the Guttman technique is that a 


universe of content must be considered unscalable if it is found that 
total scores cannot be predicted almost perfectly from item re- 
sponses, This assumption is questionable, because the technique 
involves no check upon the wisdom with which the items in the 
scale were originally selected. This need not overly concern us as 
far as the Thurstone-type scale is concerned, but when Guttman’s 
technique is applied toa Likert-type scale, we shall find au other 
check upon the value of the items. One investigator may just happen 
to have selected a set of items that will meet Guttman’s criterion, 
but another investigator, in attempting ‘to cover the same Be 
may have been less fortunate 1n his selection and may ek ne a 
group of items which does not meet Guttman s criterion. This latter 
set of items can be considered unsatisfactory, but to conclude that 
the universe of content is not scalable seems unwarranted. 
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Rundquist and Sletto were careful to include in their scales equal 
numbers of positive and negative statements. Positive statements 
are those phrased so that agreement with them indicates a favorable 
attitude on the underlying continuum. Negative statements are 
those phrased so that agreement with them indicates an unfavorable 
attitude on the underlying continuum. Our discussion on item rel- 
evance suggests that we ought to inquire as to whether we get 
different or the same results from these two types of item. 

Rundquist and Sletto report that subjects tend to disagree with 
negative items to a greater extent than they tend to agree with 
positive items. Rundquist and Sletto also report that negative 
statements yield distributions more heavily weighted at the favor- 
able end of the scales (hence, the means are lower); that responses 
to negative statements tend to be more internally consistent and 
therefore tend to yield larger standard deviations; that negative 
statements tend to yield more consistent responses from test to 
retest; and that positive and negative statements correlate low 
within a single scale but correlate equally well with total scores. 

Rundquist and Sletto suggest that the difference between the 
scores on positive and negative statements might be found useful 
as a measure of adjustment. They predicate this on the hypothesis 
that negative statements may create tension or conflict. If this is 
true, the greater the discrepancy between the scores on positive 
and negative statements, the greater the conflict or tension created 
and the greater the maladjustment. Rundquist and Sletto explored 
this possibility to some extent, but they had no satisfactory criterion 


of maladjustment. It is to be hoped that this interesting lead can 
someday be put to further test. 


Norms. When all inconsistent and irrelevant items have been 
eliminated, the items remaining in the scale are rescored. Then norms 
are prepared. Norms for Likert-type attitude scales can be prepared 
in the same way as those for most other psychological scales. They 
can also be prepared, however, in a manner suggested by Guttman. 
This method enables us to determine the degree to which any given 


attitude departs from a psychologically and rationally determined 
point of neutrality. 


i To determine this point of neutrality it is necessary for us to dis- 
tinguish between an attitude per se and its intensity. For example, A 
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has a very favorable attitude toward war and feels intensely that 
he is right. B, on the other hand, has a very unfavorable attitude 
toward war and feels intensely that he is right. A and B differ in 
the content of their attitudes or in their positions on the attitude 
continuum, but they occupy the same position on an intensity 
continuum, Guttman suggests that this nonlinear relationship 
between an attitude and its intensity be used as a basis for deter- 
mining the point of neutrality on the total score continuum. This 
point is to be found, says Guttman, where intensity has its lowest 
mean value. 

Fold-over Technique. Guttman describes two methods of determin- 
ing intensity values. One is called the fo/d-over technique, and the 
other is called the fwo-part technique. In the first, the fold-over 
technique, Guttman assumes that the point of neutrality for each 
statement is located at the response called “undecided” and that 


there are equal increases in intensity in both directions from this 
neutral point. When five responses are involved, the “undecided” - 
Position is assigned a v 


alue of 0, the responses “agree” and “dis- 
agree” are assigned a value of 1, 


and the responses “strongly agree” 
and “strongly disagree” are assigned a value of 2. Total intensity 
scores for all subjects are © 


btained by addition. When these scores 
have been obtained, a scatter plot showing their relation to the 
attitude scores is prepared. An example of such a scatter plot is 
shown in Table 54. We compute the mean intensity value for each 
vertical array of attitude scores and take as our point of neutrality 
the attitude score associated with the lowest intensity level. In our 
illustration this value turns out to 


be 4. 
Two-part Technique. This requires that responses for intensity be 
secured separately fror 


n those for attitude content. This procedure 
has two advantages It makes certain that intensity responses are 
s ká © a . 
independent of (this does no 


t mean unrelated to) the content re- 
Sponses, and it avoids the assumption of the fold-over technique that 
the responses “strongly agree and strongly disagree” are equi- 
distant from the point of neutrality. In the two-part technique the 
actual distances can be determined through Likert’s sigma-deviate 
method of scoring. When these scoring values have been determined, 
total intensity scores can 


be secured, and the remainder of the 
procedure follows as before- When we have determined our point of 
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Tasie 54. The Relation between Attitude Content and Intensity* 


Content 
Intensity Total 
0-2 | 3-5 6-8 9-10) 11 |12-13) 14 


14 aa oe ot a ba 1 
13 a Wins Wa eI ae s Ü 
12 1 | F sy 1 
ul klela lr 4 
10 w P aa | i | 2 3 
9 BELT Veh ha 8 
8 Shag rp ad « 6 
7 cia l#l|ealelal 13 
6 1 71 & i] 2 | 10 
5 l m 1 
4 1 | | 4 3 
3 tlee 1 
Teiler alas lo le | 2 30 


* From Churchman, C. W., Ackoff, R. L., and Wax, M. (Eds.) Measurement of Con- 
sumer Interest. Philadelphia: University of Pennsylvania Press, 1947. 
neutrality, we can determine how favorable or how unfavorable 
any given attitude is to be considered by noting how far, and in 
what direction, it departs from this point of neutrality. 


ADVANTAGES OF THE METHOD OF SUMMATED RATINGS 


The advantages which Likert claims for the method of summated 
ratings, in contrast with the Thurstone method of equal-appearing 
intervals, are that the scale is easier to construct, that no preliminary 
judging group is necessary, that greater reliability is secured, that 
the scoring system can be better adapted to groups whose attitudes 
are to be measured, and that it does not require certain of the 
assumptions inherent in the Thurstone technique. Let us review 
these claims and see to what extent they can be substantiated. 

Simplified Scale Construction. The ease with which a scale can 
be constructed depends upon the insight of the investigator in 
selecting useful and significant statements. It also depends upon 
the number of statements to be collected, the number of statements 
to be included in the completed scale, and upon the number of sub- 
jects whose records are to be included in the scale-standardization 
process. And when the method is applied strictly, the items must be 
rescaled for every group whose attitudes are to be measured. In 
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view of these facts it is difficult to see how the construction of a 
Likert-type attitude scale can be, except by chance, much less 
laborious than the construction of a Thurstone-type attitude scale. 
Perhaps we should not attempt to settle here the issue which Likert 
has raised, but we can certainly conclude that there is some question 
about the claim which he makes. 

Eliminates Judging Group. The second advantage which Likert 
claims for his technique is that it does away with the need for a 
preliminary judging group. The purpose of the judging group is to 
ensure the selection of statements scaled at equal intervals through- 
out the attitude continuum. Therefore if Likert’s technique elimi- 
nates the need for a judging group, we should find items selected 
according to the Likert technique fairly evenly spaced throughout 
the attitude continuum. Is this result obtained? There have been 
two attempts to answer this question: that by Ferguson and that by 


Edwards and Kenney. 


Ferguson asked 100 University of Connecticut students to rate the 


statements in Rundquist and Sletto’s Survey of Opinions by the 
Seashore-Hevner method. Following Thurstone and Chave’s meth- 
ods, equal-appearing-interval scale values of these items were 
computed. The results are given in Fig. 5. It is obvious that the 
statements in most of the scales represent only the very favorable 
or the very unfavorable portions of the continuum. In spite of the 
care exercised by Rundquist and Sletto in the construction of their 
survey, they achieved little success in securing statements represent- 
ing all degrees of attitude along each continuum. In view of these 
results we venture the opinion that Likert’s technique does not do 
away with the need for a preliminary group of judges. he 
Edwards and Kenney have criticized this conclusion by pointing 
out that it is based upon a Thurstone-type rescaling of items which 
were designed originally for a Likert-type scale. Therefore they 
started with Thurstone and Chave’s original list of the 130 items 
used to construct their scale measuring attitude toward the church 
and put them through both the Tiras fone and leet ‘<i 
The subjects were 80 students at the University of Maryland. — 
Edwards and Kenney selected two sets of 20 items to constitute 
two alternate forms of their Thurstone-type scale and one set of 25 


items to constitute their Likert-type scale. Only five items were 
found to be common to the two scales. Edwards and Kenney then 
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computed the correlations between the scores secured from the 
Likert-type and Thurstone-type scales and reported coefficients of 
79 and .92. They seized upon the latter correlation as demonstrating 
“the fact that i is possible to construct scales by the two methods 
which will yield comparable scores.” But their coefficient of 9 
demonstrates that it is also possible to construct scales by the two 
methods which will not yield completely comparable scores. We 
are left with our original conclusion that the Likert method of 
attitude-scale construction does not obviate the need for a group of 
preliminary judges. 
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Fic. 5. Distributions of the scale values for the statements in the Minnesota 
Survey of Opinions. (From Ferguson, L. W. A study of the Likert technique of 
attitude-scale construction. 7. soc. Psychol., 1941, 31, 51-57.) 


Greater Reliability. The third proposed advantage of the Likert 
technique is that it produces scales of greater reliability than the 
‘Thurstone technique. Likert offers in favor of this claim an attained 
reliability of .88 for his 24-item internationalism scale as compared 
with reliabilities of .78 and .74 for the 22-item Droba scale for the 
measurement of attitude toward war. Likert makes much of the 
fact that to attain a reliability of .88 with the Droba scale, it would 
be necessary to use both forms of the scale, or a total of 44 items. 

In discussing reliability, let us keep clearly in mind the fact that 
in the typical Thurstone technique all we ask of a subject is that he 
respond with a check mark (v) if he agrees with a statement oF 
with a cross (x) if he disagrees. In the typical Likert technique we 
ask our subject to check one of five alternate answers. Therefore it 
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is not quite fair to say that a 20-item Thurstone scale is the same 
length as a 20-item Likert scale. In the latter there are 100 possible 
responses, and in the former there are only 40. In our example the 
Likert scale is really two and one-half times longer than the Thur- 
stone scale. 

Likert realized the nature of this inequality. Therefore he revised 
the Droba scale so it could be used with his five-choice alternate 
answers. However, to do this he found it necessary to eliminate four 
statements from each form of the scale as being unamenable to his 
method of answering. This left 18 items in each form of the scale. 
On this revision Likert secured a reliability of .88 for each of these 
forms. This reliability is equal to that reported by Likert for his 
24-item internationalism scale. So, using data which Likert himself 
provides, we can conclude that the Thurstone and Likert methods 
yield reliabilities of comparable magnitude. 

Scoring Adapted to Group. The fourth advantage which Likert 
claims for his technique is that the scoring can be better adapted to 
the particular group whose attitudes are to be measured. In other 
words we can change the scoring values for each new group whose 
attitudes we wish to measure. It is difficult to see how this change, if 


indeed it is to be effected, can be an advantage. In the first place, it 


makes it impossible for us to compare the groups with each other, 


because the units of measurement will be different. And in the second 
place, it requires the labor of recalculating the scale values for each 
group whose attitudes are to be measured. 

Likert might at this point argue that his demonstration that an 
arbitrary scoring system gives the same results as the sigma-deviate 
scoring technique makes this recalculation of weights unnecessary. 
But if Likert is to offer this argument, he must give up the claim 
that his technique makes it possible for us to readapt the scoring 
values for each group- If it is unnecessary to make this adaptation— 
and if the arbitrary scoring system works it 75 unnecessary—the 
alleged advantage fails to materialize. |! 

Fewer Assumptions. Finally, Likert states: that his technique 
does not require some of the questionable assumptions of the Thur- 
stone technique. He mentions chiefly the assumption that the 
attitudes of a rater have no effect upon his evaluation of the state- 
ments. We have already presented evidence indicating that this 
assumption is a legitimate one and need no longer be held in question. 
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The Likert technique, when strictly applied, involves the as- 
sumption that attitudes are normally distributed. In the Thur- 
stone method of attitude-scale construction this assumption is not 
necessary- 

We must conclude that most of the advantages claimed for the 
Likert technique either do not exist or, if they do, must be so seri- 
ously qualified that it is dificult to see in what way the method is 
to be preferred to the Thurstone technique. 


6 


PERSONALITY: UNIDIMENSIONAL 


APPROACHES 


There are a large number of psychological tests which are supposed 
to measure “personality.” It is unfortunate that we cannot think 
of more discrete and descriptive titles for some of these tests, because 
some are quite limited in scope and do not cover all that is usually 
implied in the general concept personality. Another unfortunate 
consequence of our inadequate nomenclature is that we talk, for 
example, of interest tests, of attitude tests, and of personality tests. 
This makes it look as if attitude and interest tests were not per- 
sonality tests. About all we can do to clear up the confusion is to 
remember that we use the term personality test in two senses: in a 
general way to cover all tests discussed in this volume and in a more 
Specific way to connote those tests not given any subclassification, 
such as interest or attitude test. 

In this chapter and in Chap. 7, we propose to discuss several 
approaches to the construction of personality tests, using this term 
in its more restricted meaning. We shall divide these tests into two 
categories: one will include unidimensional approaches, and the 
other will include multidimensional approaches. Unidimensional 
approaches are those in which one trait is defined or in which only 
one test score is secured. The trait involved may be narrow in scope 
or fairly broad, but whatever its nature, it is considered as a unitary 
function, Multidimensional approaches are those leading to several 
scores from the same set of items. These several scores may purport 
to cover the whole of personality or only a small segment of it. In 
either event different dimensions are covered, and the methods 
involved in developing such tests frequently are different from those 


used in the unidimensional approaches. 
145 
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In this chapter we shall discuss six unidimensional approaches to 
the measurement of personality. Five of the tests to be represented 
in our discussion have more historical than current interest, but it is 
important that we understand the methods used in their develop- 
ment. It is important that we understand these methods, because, 
for one reason, the tests constructed by these methods form the 
bases upon which Bernreuter constructed his widely used Person- 
ality Inventory. We shall discuss this inventory in our chapter on 
multidimensional approaches, but we would not be able to under- 
stand its value and its limitations without the material we are to 
present in this chapter. 

The tests which, historically, paved the way for the Bernreuter 
Personality Inventory are the Woodworth Personal Data Sheet, the 
Thurstones’ Personality Schedule, the Bernreuter Self-sufficiency 
Test, Laird’s Colgate Mental Hygiene Test, and the Allports’ 
Agcendance-Submission Reaction Study. 

The sixth test we plan to discuss in this chapter is the Terman- 
Miles Masculinity-Femininity Test. The development of this test 
follows a different tradition from that characterizing the other tests 
we have mentioned and will constitute an enlightening contrast to 
them. 


WOODWORTH’S PERSONAL DATA SHEET 


This test can be called the grandfather of practically all present- 
day personality tests. It has many direct descendants and has in- 
spired the development of many other tests, even though not con- 
tributing directly to their content. Woodworth devised the test in 
1917 asa tool for eliminating emotionally unstable soldiers as unfit 
for duty in the United States Army. It consists of 116 questions to 
which a subject must answer “Yes” or “No.” 

Woodworth himself apparently never published more than the 
sketchiest of notes, if any, concerning the test, so we shall rely upon 
House’s account for details of its development. According to House; 
the development of the test proceeded through five stages. First, 
Woodworth made a list of approximately 200 questions which he 
thought to be symptomatic of psychoneurotic or at least of emo- 
tionally unstable tendencies. These questions were culled from 
pertinent comment and from textbook descriptions. Second, Wood- 
worth gave this list of questions to a small group of students at 
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Columbia University and asked them to indicate their answers to 
the questions. Third, he reviewed the answers given by these Colum- 
bia students and decided to eliminate any item to which more than 
25 per cent of them gave the “‘psychoneurotic” answer. The reason 
for this was Woodworth’s assumption that if so large a proportion 
of a supposedly normal group of subjects could give the “ psycho- 
neurotic” answer, the question could not be considered symptomatic 
of mental maladjustment. These eliminations reduced the list-to a 
total of 179 questions. Fourth, this revised list of questions was given 
to 1,000 unselected normal draftees (1917) and, says House, to a 
small selected group of declared psychoneurotic soldiers.” Fifth, the 
criteria for the elimination of statements were reapplied to these 
data, and, as a result, the list was reduced to its final total of 116 
questions. Because of its historical importance, we give this list 
of questions in Table 55. On this list it was thought that psycho- 
neurotics would average some 30 or 40 “‘psychoneurotic” answers, 


whereas normals would average only 10. 
The Woodworth Personal Data Sheet has been subjected to 


numerous revisions. Many of these will be of no concern to us here, 


Tasie 55. Questions in Woodworth’s Personal Data Sheet* 


+ Do you usually feel well and strong? 
. Do you usually sleep well? 

- Are you often frightened in the middle of the night? 

. Are you troubled with dreams about your work? 

Do you have nightmares? 

. Do you have too many sexual dreams? 

Do you ever walk in your sleep? 

. Do you have the sensation of falling when going to sleep? 

Does your heart ever thump in your ears so that you cannot sleep? 
. Do ideas run through your head so that you cannot sleep? 

- Do you feel well rested in the morning? 

- Do your eyes often pain you? 
Do things ever seem ‘to swim or get misty before your eyes? 
. Do you often have the feeling of suffocating? 

. Do you have continual itchings in the face? 

16. Are you bothered much by blushing? 

. Are you bothered by fluttering of the heart? 

- Do you feel tired most of the time? 
. Have you ever had fits of dizziness? 
- Do you have queer, unpleasant feeli 
- Do you ever feel an awful pressure i 
. Do you often have bad pains in any 
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Tape 55. Questions in Woodworth’s Personal Data Sheet” (Continued) 


Do you have a great many bad headaches? 


. Is your head apt to ache on one side? 

. Have you ever fainted away? 

. Have you often fainted away? 

. Have you ever been blind, half-blind, deaf or dumb for a time? 
. Have you ever had an arm or leg paralyzed? 

. Have you ever lost your memory for a time? 

. Did you have a happy childhood? 

. Were you happy when 14 to 18 years old? 

. Were you considered a bad boy? 

. Asa child did you like to play alone better than to play with other children? 
_ Did the other children let you play with them? 

. Were you shy with other boys? 

. Did you ever run away from home? 

. Did you ever have a strong desire to run away from home? 

. Has your family always treated you right? 

. Did the teachers in school generally treat you right? 

. Have your employers generally treated you right? 

. Do you know of any body who is trying to do you harm? 

. Do people find fault with you more than you deserve? 

. Do you make friends easily? 

. Did you ever make love to a girl? 

. Do you get used to new places quickly ? 

. Do you find your way about easily? 

. Does liquor make you quarrelsome? 

. Do you think drinking has hurt you? 

. Do you think tobacco has hurt you? 

. Do you think you have hurt yourself by going too much with women? 
. Have you hurt yourself by masturbation (self-abuse) ? 

. Did you ever think you had lost your manhood ? 

. Have you ever had any great mental shock? 

. Have you ever seen a vision? 

. Did you ever have the habit of taking any form of “dope?” 

. Do you have trouble in walking in the dark? 

. Have you ever felt as if someone was hypnotizing you and making you act against your 


will? 


. Are you ever bothered by the feeling that people are reading your thoughts? 


Do you ever have a queer feeling as if you were not your old self? 


. Are you ever bothered by a feeling that things are not real? 

. Are you troubled with the idea that people are watching you on the street? 
. Are you troubled with the fear of being crushed in a crowd? 

| Does it make you uneasy to cross a bridge over a river? 

. Does it make you uneasy to go into a tunnel or a subway? 

. Does it make you uneasy to have to cross a wide street or open square? 


Does it make you uneasy to sit in a small room with the door shut? 


. Do you usually know just what you want to do next? 
. Do you worry too much about little things? 
. Do you think you worry too much when you have an unfinished job on your hands? 
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Tape 55. Questions in Wi ‘oodworth’s Personal Data Sheet* (Continued) 


70. Do you think you have too much trouble in making up your mind? 

71. Can you do good work while people are looking on? 

72. Do you get rattled easily? 

73. Can you sit still without fidgeting? 

74. Does your mind wander badly so that you lose track of what you are doing? 

75. Does some particular useless thought keep coming into your mind to bother you? 
76. Can you do the little chores of the day without worrying over them? 

77. Do you feel you must do a thing over several times before you can drop it? 

78. Are you afraid of responsibility? 

79. Do you feel like jumping off when you are on a high place? 

80. At night are you troubled with the idea that somebody is following you? 

81. Do you find it difficult to pass urine in the presence of others? 

82. Do you have a great fear of fire? 
83. Do you ever feel a strong desire to go out and set fire to something? 
84. Do you ever feel a strong desire to steal things? 

85. Did you ever have the habit of biting your fingernails? 
86. Did you ever have the habit of stuttering? 

87. Did you ever have the habit of twitching your face, nec 
88. Did you ever have the habit of wetting the bed? 

89. Are you troubled with shyness? 

90. Have you a good appetite? 

91. Is it easy to make you laugh? 

92. Is it easy to get you angry? 

93. Is it easy to get you cross or grouchy? 

94. Do you get tired of people quickly? 

95. Do you get tired of amusements quickly? 

96. Do you get tired of work quickly? 
97. Do your interests change frequently? 
98. Do your feelings keep changing from t 
99. Do you feel sad or low-spirited most of the time? 
100. Did you ever have a strong desire to commit suicide? 
101. Did you ever have heart disease? 

102, Did you ever have St. Vitus’s dance? 

103. Did you ever have convulsions? 
104. Did you ever have anemia badly? 
105. Did you ever have dyspepsia? 
106. Did you ever have asthma or 
107. Did you ever have a nervous breakdown? 
108. Have you ever been afraid of going insane? . 
109. Has any of your family been insane, epileptic or feeble-minded? 
y of your family committed suicide? 
had a drug habit? 
a drunkard? 


k or shoulders? 


happy to sad to happy without any reason? 


hay fever? 


110. Has any of you 
111. Has any of your family 
112. Has any of your family been 
113. Can you stand pain quietly? 
114. Can you stand the sight of blood ? 
115. Can you stand disgusting smells? 
116. Do you like out-door life? 


* From Woodworth, R. S. Personal Data Sheet. Chicago: C. H. Stoelting Co., 1918. 


150 Personality Measurement 


however, as they involve no change in the logic or technique of test 
construction. Among the revisions in this category, we may list 
those by Johnson in 1920, by Mathews in 1923, by Cady in 1923, by 
House in 1927, and by Papurt in 1930. 

The revisions in which we shall interest ourselves in this text are 
those which involve an important change in the logic or methodology 
or which have led to the development of later tests. Among the 
revisions in this category we may list those by Laird (the Colgate 
Mental Hygiene Inventory), by Gordon W. Allport and Floyd H. 
Allport (the Ascendance-Submission Reaction Study), and by 
Louis L. Thurstone and Thelma Gwinn Thurstone (the Personality 
Schedule). All of these revisions we shall discuss in this chapter. Two 
later tests owing their existence to the Woodworth Personal Data 
Sheet are the Bernreuter Personality Inventory (which we shall 
discuss in Chap. 7) and the Bell Adjustment Inventory (which we 
shall discuss in Chap. 8). 


THE COLGATE MENTAL HYGIENE TEST 


Our second example of a unidimensional approach to the measure- 
ment of personality is that contained in the Colgate Mental Hygiene 
Test. This test was developed by Donald A. Laird and was published 
in 1925, Laird’s purposes were to get “a fairly reliable, objective 
and valid method of spotting persons in need of mental hygiene,” 
and “to provide an instrument which would give a fairly precise 
quantitative measure of the degree and kind of deviation” from 4 
normal group. 

The test consists of two separate schedules. The first of these, the 
B1 schedule (or in a later revision the B2 schedule), consists of 75 
questions about psychoneurotic tendencies. And the second schedule, 
the C1 schedule (or in a later revision the C2 schedule), consists 

_ of 53 questions about introversion-extroversion. 

In constructing the Colgate Mental Hygiene Test, Laird theorized ` 
that “all the traits which are characteristic of mental ill health are 
but exaggerations of traits of behavior present in all humans.” 
Therefore, Laird continues, “the method ... is to have those 
traits which are significant as indicators of mental deviation 5° 
described for each individual that one can determine whether or not 
the person being examined deviates from the normal in these traits.” 
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Development. There were just five steps involved in the construc- 
tion of the Colgate Mental Hygiene Test. These were the collection 
of a list of statements, their editing, the giving of the test to an 
experimental population, the determination of deviant answers, and 
the preparation of percentile norms. 

Laird took most of his items from Woodworth’s Personal Data 
Sheet and worded them so that they could be answered by means 
of a check mark on a graphic scale. The questions were printed on 
the left-hand side of the test booklet, and the graphic scales were 
printed on the right. An example is given below: 


Only when bawled Many times 
out or such each day 


HAVE YOV BEEN  cercneimneue JAGA D iaoa 
BOTHERED BY Not to 
BLUSHING? know it 


Directions. The schedules were given to more than 2,000 students 


with these instructions: 

look at the blank) questions are asked. You will 
mark like this / along the dotted line at the place 
r for you. One end of this dotted line might be 
h, with each dot representing a step from one 


In the large type to the left ( 
answer these by making a check 
which indicates the right answe 
thought of as low, the other hig! 


extreme to the other. ae N 
To help you locate yourself some descriptive phrases are printed below the 


dotted line. It may be that none of these phrases describe you, in which case the 
check may be placed between two of the phrases. Check at any dot, using the 


phrases to help you locate the proper dot easily. 
We want to know about your personality for the last few months; say the last 


six or so. After you read each question in the larger type and then the descriptive 
Phrases, think how you have been in the past half year—not how you would like 
to have been or how you think the ideal person should have been, but how you 


actually were. a : 
You may use the margins and blank spaces to write in any explanation you 


Want to, but always make a check mark some place along each line of dots. 

After you are through you may study it over again and make any changes you 
think necessary. Do not erase, however; just cross out your first check marks 
and make hew ones. 

Remember: you may check any pl 

Think how you have been the past half year. 

Scoring. Laird prepared distributions of the answers (check 
marks) for each question and determined on the graphic scales the 
points which indicated the top and bottom quarters of the distribu- 
tions. Then he prepared scoring stencils indicating for each question 
the portion of each graphic scale included in the deviant quarters 


ace along the dotted line. 
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of the distributions. A subject’s score consisted of the number of 
answers placed in these deviant quarters. When all schedules had 
been scored, Laird prepared four sets of percentile norms: two for 
Schedule B1 and two for Schedule C1. One of the two sets of norms 
for each schedule was for men, and the other was for women. 

Reliability. Laird secured reliability data in three ways: by com- 
paring the results for duplicate sets of questions, by correlating the 
scores on comparable halves of the schedules, and by the test-retest 
method. All three methods gave evidence of only moderate degrees 
of reliability. 

The comparisons of duplicate sets of questions consisted in the 
observation that the distributions on these duplicate questions 
tended to the same form. Hoitsma, who reported these results for 
Laird, argues that, since the distributions on most of these questions 
were non-normal, their similarity can be taken as evidence of sub- 
stantial reliability. Whether the reader accepts this argument or not 
(and the author doesn’t), it applies to only five of the questions in 
the two schedules and has little relevance to the reliability of the 
total scores. 

The second determination of reliability was through the correla- 
tion of the scores on comparable halves of the schedules. In Schedule 
B1 these comparable halves were composed, on one hand, of all the 
questions in the first halves of its several sections and, on the other 
hand, of all the questions in the second halves of these same sections. 
In Schedule C1 the two halves were composed of the odd- and even- 
numbered questions. The correlation between the scores on the 
comparable halves of the B1 schedule was .79, and that for the ci 
schedule was .45. Hoitsma, who reported these coefficients, did not 
enter them in the Spearman-Brown Prophecy Formula. Had he 
done so, he would have obtained coefficients of .88 and .62. 

As the third bit of evidence on reliability, Hoitsma reports & 
test-retest correlation of .85 for the B1 schedule and one of .67 for 
the C1 schedule. These values must be considered low, since the 
interval involved was only two weeks. 

Distinguishing Features. We find in the Colgate Mental Hygiene 
‘Test two features not found in any of the other tests we shall dis- 
cuss. These features are the use of the distribution of answers te 


each question as a basis for determining which answers are to have 
deyiant significance, and the use of a graphic scale for the recording 
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of the answers. It is undoubtedly the use of the graphic scale feature 
that led to the low reliabilities. For other investigators, using prac- 
tically the same set of items, have been able to achieve much higher 
reliabilities. This fact undoubtedly led to the abandonment of the 
graphic scale answering continuum and, with its abandonment, to. 
that of the possibility of further use of item distributions as a basis 
for determining which of the answers were to possess deviant 


significance. 


THE ALLPORTS' ASCENDANCE-SUBMISSION REACTION STUDY 


unidimensional approach to the measure- 
ment of personality is that exemplified in Gordon W. Allport and 
Floyd H. Allport’s Agcendance-Submission Reaction Study (1928). 
This is a test which was designed, says Gordon W. Allport, to yield 
the incidence in the personality of the two traits 


ascendance or submission. The method of the test is to present verbally certain 
situations of life, and to require the subject toselect froma few standardized choices 
that type of behavior which most nearly characterizes his own usual adjustment to 


each of the situations. 


Our third example of a 


+. . a score which indicates 


ated the existence of two traits, ascendance and 
hat as one of these traits would be 
prominent in an individual, the other trait would be subordinate. 
They proceeded, on this theory, to choose a variety of situations in 
which they thought a person would tend to be dominant or sub- 
missive. An entire trait continuum would be represented, they said, 
not in any one situation, but by the average behavior of a person 
ina great number of these situations. An individual might be domi- 
nant in one situation and not in another. But a more dominant 
person would tend to be dominant in a greater number of situations 
than would a less dominant person, and so forth. A score on the 
test would be an algebraic summation of the number of situations 
in which the person was, OF felt he would be, dominant. If the situa- 
tions in the test could be made representative of all situations 1n 
which an individual can find himself, it could be assumed ee a 
Score represents some general tendency or habit on the part of the 


subject. : 
The test is available in one form for men and in another form for 
f avail è 
follows: 


women. A typical situation is as 


The Allports postul 
submission, and theorized t 


154 Personality Measurement 


a. Ata reception or tea do you seek to meet the important person present? 
Frequently —_ 
Occasionally — 
Never penn 

b. Do you feel reluctant to meet him? 


Yes, usually — 
Sometimes —— 


No zis 


This illustration represents 1 situation, 2 items, and 6 choices. 
Counting in this way, the men’s form has 33 situations, 41 items, and 
123 choices. The women’s form has 35 situations, +9 items, and 140 
choices. The directions which a subject is asked to follow in taking 
the test are as follows: 


Most of these situations will represent to you your own actual experiences. Reply 
to the questions spontaneously and truthfully by checking the answer which most 
nearly represents your usual reaction. If the situation has not been experienced 
endeavor to feel yourself into it and respond on the basis of what you believe your 
reaction would be. If a situation seems totally unreal or impossible to respond to 
you may omit it. 


Development. The steps in the development of the Ascendance- 
Submission Reaction Study may be listed as follows: 


1. A list of situations was collected. 

2. The test was given to a group of experimental subjects. 

3. These subjects and their close associates completed a seven-step rating scale. 

4. Mean ratings for each of the alternate answers to each question were deter- 
mined. 

5. Scouring weights were assigned. 

6. Norms were prepared. 


All situations were selected by the Allports on an a priori basis. 
The test was given to 400 men at Dartmouth and to 200 women at 
Goucher, Wellesley, and Radcliffe. Freshmen were excluded, but all 
other classes were represented. At the same time that students were 
asked to take the test, they were also asked to rate themselves and 
to have four of their close associates rate them on a seven-point 
rating scale. The directions for making these ratings were as follows: 


Kindly rate the student who gives you this paper in regard to'the trait of his 
personality which is described below. Place a check against the phrase which seems 
to you to represent best his customary level of behavior. 

Strongly marked tendency to take the active role, to dominate, lead, orga" 
ize, in dealing with his fellows. 
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—— Marked tendency to take the active role. 

Slightly above average in tendency to take the active role. 

____ averace: neither distinctly active nor passive. 

—__ Slightly under average in tendency to take active role. 

Tendency to be passive in contact with his fellows, to be led rather than to 


be the leader. 
—— Strongly marked tendency to be passive in contacts. 


Subjects were divided according to the answers given on each 
question. Then for each of these groups the average of the five rat- 
ings was ascertained. For example, one situation had these alterna- 
tive answers: “habitually,” “occasionally,” “never.” The average 
ratings for the subjects giving these answers were 3.35, 3.50, and 
3.57. Those who answered “habitually” were rated more dominant 
than those who answered “occasionally.” And these subjects, in 
their turn, were rated more dominant than those who answered 
“never.” ; ; 

The average rating assigned to all subjects was 3.48. This average 
rating was subtracted from each of the foregoing means and yielded 
the differences .13, —-02, and — 09. These differences were rounded 
to one decimal, the decimals were discarded, and the results became 
the scoring weights. In this instance these weights are 1, 0, and —1 
for the answers “habitually,” “occasionally,” and “never.” A total 
score is obtained by an algebraic summation of the values associated 
with the responses which a subject gives. The maximum possible 
range of scores extends from —79 to 81 for men and from —91 to 
112 for women. The actual range in the original subject population 
was —55 to 64 for men and —50 to 59 for women. 


Reliability. The Allports do not give extensive data on reliability 
week retest, comparing an earlier and a 


later revision, they found an intercorrelation of .78. This value is 
Subject to qualification, however, for only 37 per cent of the situa- 
tions in the two revisions were identical. For men, a more conven- 
tional split-half reliability of -74 is reported. This was obtained from 
a stepped-up correlation of .58 between the scores based on the 
situations listed on pages 1, 3, and 5 and the scores based on the 
situations listed on pages 2, $ and 6. f 
Validity. The validity data which the Allports offer consist of 


correlations between the scores on the test and the ratings used in 


determining the scoring weights. The ascendance-submission scores 


but report that on a six- 
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correlate .63 with self-ratings; .50 with the total of all ratings (self- 
and associates’ ratings combined); and .46 with associates’ ratings. 
At first glance these coefficients might appear to be quite satisfac- 
tory. But since the ratings involved were the same as those used in 
establishing the scoring system, these coefficients can in no sense be 
considered remarkable. When the Allports used the records of 42 
men and 51 women not included in the item-standardization process, 
the correlations between scores and total ratings were found to be 
29 and .30. These correlations are low, but they still indicate a 
significant relation between the scores and the ratings. 


THE THURSTONES’ PERSONALITY SCHEDULE 


We have now explained the methods by which three of the his- 
torically important unidimensional tests of personality were con- 
structed. And we have found three different methods by which the 
significance of the items can be determined. These are by judgment, 
by incidence, and by correlation with an external criterion. 

Woodworth, as we have seen, placed chief reliance upon judgment 
—his own and that of the textbook authors and authorities from 
whom he collected his questions. Woodworth supplemented this 
judgment, however, by giving consideration to the incidence of the 
responses in both normal and psychoneurotic populations. 

Laird supplemented Woodworth’s judgment with his own but 
placed his chief reliance for determining item significance upon the 
incidence of the responses in a normal college population. We recall, 
for example, that the only responses contributing to the total score 
on the Colgate Mental Hygiene Test are those in the deviant quar- 
ters of the answer distributions. 

In the Allports’ study we come, for the first time, to a considera- 
tion of the use of an external criterion for determining item signif- 
icance. Perhaps we should say semiexternal criterion rather than 
external criterion because a part of this criterion was a self-rating: 
The chief point, however, is that something besides the judgment 
of the investigator or the incidence of the item responses was used 
in determining item significance. Another first in the Allports’ study 
is the use of something other than a series of unit weights in scoring: 
We shall find that this differential weighting of test items has since 
played an important part in the history of personality-test develop- 
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ment, even though the most recent evidence and belief are to the 
effect that differential systems of weighting do not do all that is 
claimed for them. 

We are now ready to begin our discussion of three more unidimen- 
sional approaches to the measurement of personality. And in these 
we shall find two additional methods of determining item signif- 
icance. One of these methods will consist of the criterion of internal 
consistency which we encountered in our discussion of Likert’s a 
posteriori method of attitude-scale construction. And the other 
method will consist of a criterion of external consistency, somewhat 
more objectively applied than in the case of the Ascendance-Sub- 
mission Reaction Study. 

Our fourth example of a unidimensional approach to the measure- 
ment of personality is that contained in Thurstones’ Personality 
Schedule. This schedule was developed in 1928 by Louis L. Thurstone 
and Thelma Gwinn Thurstone. It consists of 223 questions to each 
of which a subject must answer “Yes,” “No,” or “ ?.” The schedule 
was designed, say the Thurstones, to yield a “fairly reliable” index 
of the neurotic tendencies of university freshmen. 

Development. The steps in the development of the Personality 
Schedule may be listed as follows: 


A list of statements was collected. 

These statements were edited. 

An a priori scoring key was developed. 
The Schedule was given to a group of subjects. 
An item analysis was performed. 

Norms were prepared. 


Ayres 


The Thurstones collected a list of over 600 statements. These came 
from their reviews of the Woodworth Personal Data Sheet, of 
House’s monograph “A Mental Hygiene Inventory,” of the Colgate 
Mental Hygiene Test, of Freyd’s monograph “Introverts and 
Extroverts,” and of the Allports’ Ascendance-Submission Reaction 
Study. The 600 questions were typed individually on cards, were 
classified in various groups, were rearranged, were edited, and were 
finally reduced to a select 223 items. These items became the Per- 
sonality Schedule. On an a priori basis the Thurstones decided which 
answers were to be considered symptomatic of neurotic tendency, 
and a scoring weight of 1 was assigned to each such answer. 
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Directions. As an initial tryout the Personality Schedule was given 
to 694 University of Chicago freshmen. Their instructions were as 


follows: 


In order that your advisers may help you in the best possible way it is desirable 
that they know something of your personality as well as of your intellectual ability 
and scholarship. The questions in this blank are intended to indicate various emo- 
tional and personality traits. Your answers may reveal a well-adjusted emotional 
life or they may show that you have some form of nervousness or worry which 
you may not yourself understand completely. If your answers show emotional mal- 
adjustment you will have the opportunity to get advice about this aspect of your 
development. If your answers reveal a well-adjusted personality, that fact will be 
known to your advisers. 

This is not an examination. It is not a test in any sense because there are no 
right and wrong answers to any of the questions in this blank. Your admission to 
the University and your scholastic standing will not be affected in any way by 
your answers to these questions. 

Your answers to particular questions will be confidential. They will be known 
only to two or three persons who will study these blanks and who will summarize 
your answers in a brief statement for your Dean. It has been found that some of the 
brightest students have emotional and personality difficulties which can be over- 
come with suitable counsel if the difficulties are known. It will therefore be to your 
own advantage to answer the questions as truthfully as possible. 

In front of each question you will find: Yes No ? 

Draw a ring around one of these three answers for each question. Try to answer 
by “yes” or “no” if it is possible. If you are entirely unable to say even a tenta- 
tive “yes” or “no” to the question, then draw a ring around the question mark. 

Norms. Scores were obtained by counting the number of questions 
to which “neurotic” answers were given. The total possible range 
of scores extended from 0 (if no “neurotic” answers were circled) 
to 223 (if all “neurotic” answers were circled). The actual range 
extended from 5 to 134. 

Item Analysis. The Thurstones now wished to check upon the 
“adequacy” of their a priori assignment of scoring weights. There- 
fore they selected the 50 most neurotic-scoring subjects and the 50 
least neurotic-scoring subjects and counted the number of “ neu- 
rotic” answers which each of these groups gave in answer to each 
of the 223 questions. They wanted to find out if the 50 most neu- 
rotic-scoring subjects gave “neurotic” answers more frequently 
than the 50 least neurotic-scoring subjects. The Thurstones foun 
this to be true for all but one of the items. In view of these results 
they concluded that their a priori assignment of scoring weights ha 
been satisfactory. 
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Tasie 56. The Most Differentiating Items in the Personality Schedule* 
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. Are you casily moved to tears? 
- Does it bother you to have peop 


- Do you have difficulty in making friends? 
- Are you troubled with the idea that peopl 


+ Have you ever been depressed because of low ma 
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+ Do you often experience periods o 
+ Do you often feel self-conscious in the 
+ Do you lack self-confidence? 


- Do you often feel self-conscious 


- Do you get stage fright? 


- Do you have difficulty in starting a conversation with a stranger? 
Do you worry too long over humiliating experiences? 

Do you often feel lonesome, even when you are with other people? 
Do you consider yourself a rather nervous person? 


. Are your feelings easily hurt? 
- Do you keep in the background on social occasions? 


» Do ideas often run through your head so that you cannot sleep? 


. Are you frequently burdened by a sense of remorse ? 
. Do you worry over possible misfortunes? 
- Do your feelings alternate between happiness and sadness without apparent reason? 


+ Are you troubled with shyness? 


- Do you day dream frequently? 


- Have you ever had spells of dizziness? 
+ Do you get discouraged easily? 


» Do your interests change quickly ? 
le watch you at work even when you do it well? 
. Can you stand criticism without feeling hurt? 


e are watching you on the street? 
at you lose track of what you are doing? 


© Does your mind often wander so badly th 
3 rks in school? 


+ Are you touchy on various subjects? 


+ Are you often in a state of excitement? 
- Do you frequently feel grouchy? 


; a8 3 
- Do you feel self-conscious when you recite 1n class? 


- Do you oft ; erable? 
often feel just miserable: _ : 
: l ng into your mind to bother you? 


ss thought keep comi 


Does some particularly usele spicon 
lass recitation! 


Do you hesitate to volunteer in a cl 


+ Are you frequently in low spirits? 
f loneliness? 


presence of superiors? 


+ Do you find it difficult to speak in public? 


because of your personal appearance? 

If you see an accident are you quick to take an active part Ss e Hel 
- Do you feel you must do a thing over Laie times before you leave it? 
» Are you troubled with feelings of inferiority: 
Do you often find that you cannot make up yo! 


ups a w wi en! ? 
i arent cause’! 

i 4 e n mood ¥ ithout appari 

u have and downs ! app! 

' : our abilities? 


ur mind until the time for action has passed ? 


T., G. A neurotic inventory. J. soc. Psychol., 1930, 


1, 


* From Thurstone, L. L., and ‘Thurstone, 
3-30. 
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The 42 items showing the greatest degree of discrimination are 
given in Table 56. The content of these most differentiating items 
suggests, to the Thurstones, “that the fundamental characteristic 
of the neurotic personality is an imagination that fails to express 
itself effectively on external social reality.” 

Reliability. When the Personality Schedule was printed, it was 
prepared so that two columns of questions appeared on each one of 
its four pages. In scoring, it was found convenient to enter a subtotal 
at the foot of each column of questions. The Thurstones utilized 
these subtotals in determining the reliability of total scores. They 
correlated the scores obtained from the questions printed in the 
left-hand columns of the schedule with those obtained from the 
questions printed in the right-hand columns. When they obtained 
this coefficient, they entered it in the Spearman-Brown Prophecy 
Formula and found a total score reliability of .95. 

Validity. A score on the Personality Schedule shows the number of 
“neurotic” answers checked by a subject. But these “neurotic” 

answers are “neurotic” only because the Thurstones said they were. 
The author would be one of the last to decry the value of the Thur- 
stones’ judgment, but we must recognize the fact that no other 
standard was used. 

Landis and Katz have presented some data which show that the 
validity of the Personality Schedule varies with the score involved. 
They find that the answers of psychotic and psychoneurotic subjects 
with “neurotic” scores agree fairly well with case-history data. But 
as the scores of psychotic and psychoneurotic subjects recede from 
the “neurotic” end of the continuum and become more “normal,” 


€ 


Tague 57. Percentage of Answers Agreeing with Case-history Findings* 


Percentile Number | Percentage 

90-100 25 | 91 

80- 89 10 | 79 

50- 79 a7 | g 

20- 49 21 70 

10- 19 6 60 

oO 9 10 60 
Pore as eae 224 E 


* From Landis, C., and Katz, S. E. The validity of certain questions which purport to 
measure neurotic tendencies. F. appl. Psychol., 1934, 18, 343-356. 
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the less becomes their correspondence to case-history data. These 
findings are summarized in Table 57. These findings agree, by the 
way, with the Thurstones’ own statement indicating greater validity 
for the “neurotic” than for the “‘nonneurotic” scores. 


BERNREUTER’S SELF-SUFFICIENCY TEST 


Our fifth example of a unidimensional approach to the measure- 
ment of personality is that contained in Bernreuter’s test of Self- 
sufficiency. This is a test which Bernreuter prepared prior to the 
development of his widely used Personality Inventory and which, as 
we said before, we shall discuss in the next chapter. 

The Self-sufficiency Test was published in 1933 under the title 
“Personal Preference Blank.” It consists of 60 questions to which a 
subject is asked to answer “Yes,” “No,” or “2.” These questions 
elicit answers which will be indicative of the extent 
or is not dependent upon other 
ent upon other individuals is 


are supposed to 
to which a subject is dependent upon 
persons. A subject who is #ot depend c 
called a self-sufficient person; hence the name of the test. 
Development. The steps through which Bernreuter proceeded in 


the development of the Self-sufficiency Test may be listed as follows: 


. A list of statements was collected. 


< These statements were edited. 
. An a priori scoring key for the 
. The test was given to a group 0 


1 

2 

3 ‘no” answers was developed. 
4 

5. An item analysis was performed. 

6 ) 

7 

8 

9 


tyes” and “ 
f subjects. 


- The test was revised. zti 

- Scoring weights for “ >”? answers were a = 

. The revised test was given to new groups of su jects. 

- Percentile norms were prepared. 

Bernreuter’s first step Was t° collect a list of 132 items. These 

items were worded in the form of questions so that a subject could 
“p” 


answer “Yes,” “No,” or © " : ; 
i Sad Kah step was to develop, with the aid of his col- 
leagues at Washington University (St. Louis), an a priori scoring 
key for the “yes” and “no” answers. In this key one point was 
assigned for each answer thought to be indicative of se activins 
At this stage no scoring weights were assigned for ? f 
The next step consisted of giving the 132 items to lz ashing- 
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ton University elementary-psychology students. Here are their 
instructions: 

The questions on this sheet are intended to indicate your likes and dislikes. 
It is not an intelligence test, nor are there any right or wrong answers. 

In front of each question you will find: Yes No ? 

If your answer is “Yes” draw a circle around the “Yes.” If your answer is 
“No” draw a circle around the “No.” If you are entirely unable to answer either 
“Yes” or “No” to the question then draw a circle around the question mark. 


Item Analysis. Scores were obtained by counting the number of 
answers which, according to the a priori key, were considered indica- 
tive of self-sufficiency. The scores obtained ranged from 16 to 117. 
The papers were now arranged in order according to these scores, and 
the 24 highest scoring students and the 24 lowest scoring students 
were selected as criterion groups for further study. To select these 
groups Bernreuter noted the scores falling at plus and minus one 
standard deviation unit from the mean of the distribution and then 
used the scores nearest these two points that would give him an equal 
number of cases in the two criterion groups. 

To get an index of the discriminatory power of each item Bern- 
reuter counted the number of “Yes,” “No,” and “ ?” answers given 
by each criterion group and subtracted the number of low-scoring 
subjects giving a designated answer from the number of high-scoring 
subjects giving this same answer. For example, if 12 low-scoring 
subjects and 20 high-scoring subjects answered “Yes,” the 12 was 
subtracted from the 20 to yield a difference of 8 (i.e., 20 — 12 = 8). 
This difference was taken as a measure of discriminative value. The 
range of all differences extended from 0 to 19. When all differences 
had been computed, Bernreuter selected the 60 most discriminating 
items to constitute the final form of the Self-sufficiency Test. 

At this point Bernreuter determined the significance of the “?” 
answers. He did this by comparing the number of high-scoring 
subjects and the number of low-scoring subjects giving a “ ?” as an 
answer to each of the questions. When Bernreuter found that more 
of the high-scoring subjects gave a “?” as an answer, he assigned 
the “ ?” a scoring weight of 1. He also assigned it such a weight if the 
number of high-scoring and low-scoring subjects were the same and 
if the “high scoring group shows less consistency within itself in 
responding to that particular item. ... ” On 12 items the “?” 
was found to be indicative of self-sufficiency. 
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Norms. The revised (60-item) edition of the test was then given to 
388 men and +56 women. Subjects were secured from Washington 
University, Stanford University, Chico (California) State College, 
San Francisco State College, and Menlo (Menlo Park, California) 
Junior College, and upon the basis of the responses of these subjects 
percentile norms were prepared. 

Reliability and Validity. Bernreuter determined the reliability 
of the Self-sufficiency Test both by the split-half and by the test- 
retest techniques. Both techniques led to a coefficient of .84. 

The validity of the self-sufficiency scores was established, says 
Bernreuter, by correlating them with a series of ratings. Three 
ratings were secured. Two of these were from close associates, and 
the third was a self-rating. These were on need for sympathy, ap- 
preciation, and encouragement; desire to be alone; frequency of 
asking advice; and ability to handle responsibilities. The correlations 
between these ratings and self-sufficiency scores were .36, .69, .52, 
and .18, Self-sufficiency scores correlated .60 with total self-ratings 
and .54 with ratings supplied by associates. These coefficients were 


based upon the records for 58 women. 
Table 58 shows how 21 high-scoring students and 21 low-scoring 


TABLE 58. Significance of the Differences between the Mean Ratings Assigned to 
High- and Low-scoring Students on the Self-sufficiency Test* 


oy Self- | Associates’ | Combined 
Trait rating rating rating 
RI Nee sympathy, appreciation and encourage- 
— vd cal A = j wie 0.83 1.60 1.95: 
R2 Desire to be alone. re He pee 
R3 Frequency of asking advice. . p ss is 
R3 f requency of asking advice 128 Pie fie 


R4 Ability ‘to handle responsibilities 


; >` T > clf-sufficiency. J. abnorm. soc. Psychol. 
* From Bernreuter, R. G. The measurement of self-sufficiency. F. ab: yi n 


1933, 28, 291-300. 
al of 128 students) differ on their own and on 


The figures in the body of the table are 
between mean ratings assigned to 


students (out of a tot 
their associates’ ratings. 
critical ratios of the differences 


the high-scoring and low-scoring subjects. f 
These critical ratios show that self-sufficiency scores are more 


nearly a reflection of “desire to be alone” and of “frequency of 
asking advice” than they are of “need for sympathy, appreciation, 
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and encouragement” and of “ability to handle responsibilities.” 
Bernreuter does not seem perturbed that the self-sufficiency scores 
do not correlate so well with these latter traits, but he must have 
felt that they were integral parts of the trait “self-sufficiency,” or 
there would have been no reason to seek ratings on them in the first 
place. 

The results presented in Table 58 led Bernreuter to compute the 
correlation between self-sufficiency scores and a sum of the ratings 
on “desire to be alone” and “frequency of asking advice.” The 
correlation with self-ratings was found to be .62, with associates’ 
ratings, .44, and with combined ratings, .58. The number of subjects 
was not large, but these correlations can be considered substantial. 
But even so, it is difficult to see why we should accept as relevant 
ratings on only two of the traits and discard as irrelevant those 
which did not result in such high correlations. 


THE TERMAN-MILES MASCULINITY-FEMININITY TEST 


Our sixth and last example of a unidimensional approach to the 
measurement of personality is contained in the Terman-Miles Atti- 
tude Interest Analysis Test, more popularly known as the M-F test. 
In it we shall find the use of an external criterion for the determina- 
tion of item significance in contrast with the use of an internal 
criterion, such as that used by Bernreuter and the Thurstones. 

We shall also find in the M-F test another important contrast with 
the approaches we have been discussing. In these approaches some 
trait has been visualized or defined, and a test has been built with 
the purpose of measuring whatever trait had been visualized or 
defined. If criterion groups were needed, they were selected upon 
the basis of the test being constructed. 

In the Terman-Miles M-F test we follow a radically different 
procedure. We define our criterion groups ahead of time and permit 
our test items to have no part in their selection. We set as our objec- 
tive the building of a test to distinguish our previously defined and 
selected criterion groups. The test scores acquire meaning with 
reference to the nature of these criterion groups. This is in marked 
contrast with a test based on a criterion of internal consistency, 1n 


which case the criterion groups are defined and selected in terms of 


test scores. 
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We shall find that the ‘Terman-Miles technique of defining and 
selecting criterion groups before constructing the test yields results 
far superior to any secured by internal consistency techniques. We 
found that this same fact applied in our discussion of interest tests. 
Dr. Strong used, as we will recall, carefully defined criterion groups 
and achieved remarkably successful results. Dr. Kuder constructed 
his Preference Records by techniques which did not make use of 
criterion groups, and, to date, his test must be considered to possess 
considerably less validity than we can ascribe to the scores on the 
Strong Vocational Interest Test. It is admittedly difficult, in the 
area we are now discussing, to obtain easily identified criterion 
groups. However, Terman and Miles did this in their development 
of the M-F test, and to their work we now turn. 

Test. According to Terman and Miles: The purpose 
d to the development of the M-F test, [was] the 
Jinity-femininity of something similar to Binet’s 

gence—a quantification of procedures and 
concepts. . . . / A measure is needed which can be applied to the individual and 
scored so as to locate the subject, with 4 fair degree of approximation, in terms of 
deviation from the means of either sex. Range and overlap of the sexes must be 
more accurately determined than is possible by observational and clinical methods. 
- . . The purpose of the M-F test is to enable the clinician or other investigator to 
obtain a more exact and meaningful, as well as a more objective, rating of those 
aspects of personality in which the sexes tend to differ. More specifically, the pur- 
pose is to make possible a quantitative estimation of ihe amount and direction of a 
subject’s deviation from the mean of his or her sex, and to permit quantitative 
comparisons of groups differing in age, intelligence, education, interests, occupa- 


tion, and cultural milieu. [And, finally,] it is evident that no clear delineation of 
sexual temperament is possible on the basis of uncontrolled observation. The M-F 
test is an attempt to remedy this situation. Its scientific intent is to free the con- 
cepts of masculinity-femininity from the irrelevancies and confusions which have 
become attached to them as @ result of superficial consideration of everyday 


behavior. 


Purpose and Content of 
of the investigations which le 
accomplishment in the field of mascu 
early achievement in the field of intelli 


o equivalent forms. It consists of a 
h of which a subject must respond 
by checking or underlining one oF two, three, or four alternate 
answers. The contents of the test may be classified as in Table 59. 

Development of Test- The idea of constructing a masculinity- 
femininity test first occurred to Dr. Terman when, vl onic was 
working over some sex-difference data for his group of gifted children. 
Terman had classified a group of games and childhood amusements 


The M-F test is available in tw 
Variety of stimulus objects to eac 
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in accord with their relative preference by boys and girls and was 
attempting to work out a masculinity index upon the basis of an 
‘ndividual’s preference for activities preferred by boys in contrast 
with those preferred by girls. In preparing separate sex distributions 
of the masculinity indices, one of Terman’s assistants noticed what 
seemed to be an error. One of the boys received a more feminine 
score than any of the girls. A recheck was made but no error was 
found. This called into question the correctness of the sex classifica- 
tion. This was checked and no error was discovered. This led to a 
careful investigation and to the preparation of a complete case 
history of the boy in question, with verification of his feminine 
interests, propensities, attitudes, and behavior. 


Taste 59. [tem Content of the Terman-Miles M-F Test* 


| Number of items 


Exercise = SS 
Form A | Form B 

1. Word association.........-. ne 60 60 
2. Ink-blot association. à 18 18 
a REEE ea 70 70 
4, Emotional and ethical responses. .. 105 105 
S, TOTEE. ayaan aens sas ae anh ee «s| AND 118 
6. Personalities and opinions........+..+++0++ 42 41 
7. Introvertive responses 42 42 

TOD raams amans goaa 456 454 


* From Terman, L. M., and Miles, C. C. Sex and Personality. New York: McGraw-Hill 
Book Company, Inc., 1936. 


Terman and Miles give, with more than usual completeness, 4 
step-by-step account of the development of each part of the M-F 
test. We shall describe these steps in an attempt to give the reader 
some idea of the tremendous amount of spadework necessary tO 
develop an adequate psychological test. 

Word Association. The first step in the preparation of the word- 
association test was that of scanning a short English dictionary for 
words which appeared to be capable of eliciting different responses 
from men and women. These words were rated by threè judges as 
to their probable value in being able to elicit sex differences. As 4 
result of these ratings, 280 words were discarded. The remaining 
220 words were divided into two sets of 110 words each. These words 
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were printed, individually, on cards and were given to 200 high- 
school and college men and to 200 high-school and college women, 
with instructions to respond to each of the stimulus words with the 
first word thought of. A scoring system was worked out, and the 
stimulus words were given to additional subjects. The results were 
unsatisfactory, however, so a new approach was tried. 

This new approach involved printing the stimulus words in a 
booklet with four possible response words following each stimulus 
word. The subject was instructed to underline one of the response 


words, the one which best went with the stimulus word. 

response words were those which, in the pre- 

liminary studies, showed some evidence of yielding sex differences. 
reviously untried stimulus words were 


However, 51 additional and p 
added, and of these, 28 were found to be usable. In each set of four 
ed by men and two by women. 


response words two were preferr 

The test now consisted of 171 stimulus words and was given, in 
this form, to 600 subjects. A study of the responses yielded 120 
items for the final form. The words retained were those that showed 
sex differences in the same direction for at least three out of four 
possible responses in all groups tested (100 boys and 100 girls in the 
seventh grade, 100 boys and 100 girls in junior high school, and 100 
men and 100 women in college). 

The transition from the first uncontrolled response situation to the 
second and controlled response situation was dictated by the follow- 
ing reasons. The first method was found disadvantageous in that 
most of the responses had such low frequencies that large numbers of 
subjects would be needed to establish sex differences. Many of the 
responses, for this reason, could not be scored. Scoring, even when 
Possible, was found to be laborious and time-consuming. And some 
subjects lost their place on the answer sheets and misplaced all 
Subsequent answers. The second and controlled method was found 


to be better adapted to group testing and to require less time for its 


administration and scoring. : 
Ink-blot Association. To initiate work on this section of the test, 
Terman and Mary A. Bell made 40 ink blots according to Derrkor s 
directions and used, besides these, 20 blots furnished by Whipple. 
None of these blots proved satisfactory, so a new series of 100 blots 
e with printer’s ink with sweeping 


Was prepared. These were mac s 
strates ue a paintbrush. These 100 blots were given to 100 male and 


For the most part, 
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100 female high-school and college students with a request that 
they write down whatever each blot made them think of. Seventy 
blots were found to be acceptable (that is, they seemed capable of 
eliciting sex differences) and were photographed and reproduced on 
the pages of a 3- by 5inch booklet, one blot per leaf. These booklets 
were given to 230 male subjects and to 230 female subjects (seventh 
grade, high-school freshmen, college, and adult groups) with these 
instructions: 


On each leaf of this booklet is a kind of ink blot or drawing. They are not pic- 
tures of anything in particular, but might suggest almost anything to you, just 
as shapes in the clouds sometimes do. Below each drawing write the first thing it 
makes you think of. (Subjects were given 10 seconds for each drawing.) 


The responses were studied in detail and were retained for further 
trial only if they were given by at least four subjects in at least three 
of the four groups studied and if they showed a sex difference in the 
same direction for at least three of these four groups. 

The number of items was reduced to 50, and the response words 
which met the above criteria were printed after each stimulus blot. 
In this final form the test was given to 600 subjects (300 male and 
300 female) and reduced as a result of this new exposure to a total 
of 36 items. These were equally divided between Form A and Form 
B. 

Information. Two hundred items of information were prepared. 
These covered history, physical science, biological science, literature, 
general information, household arts, religion, and mythology. These 
items were prepared in multiple-choice form and were given to 800 
subjects. The successes and failures were separately tabulated for 
each of four subject groups (seventh graders, high-school students, 
college students, and adults). The items retained for further trial 
had to show a significant difference in the same direction for at least 
three of the four populations. Ninety-one items were retained and 
given a further tryout on new populations. At this same time 491 
additional items were also tried and of this group 95 proved worth 
retaining. Therefore these 95 items plus the previous 91 gave 186 
items for further trial. 

On these items a new method of scoring was tried. On all previous 
trials the scoring procedure was to count the number of masculine 
items correctly answered and to subtract from this the number of 
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feminine items correctly answered. It was found during the course 
of investigation, however, that some of the wrong responses and 
omissions were just as productive of sex difference as were some of 
the right answers. Therefore steps were taken to include omissions 
and wrong answers in the scoring formula. The final test consists of 
140 items equally divided between the two forms. 

Emotional and Ethical Attitudes. Terman and his associates pre- 
pared a total of 218 items for this part of the M-F test. They covered 
the “emotions” of anger, fear, disgust, pity, and a variety of ethical 
attitudes. The subject was instructed to read each of the stimulus 
) and to indicate to what extent the situa- 
e in him the emotion in question.” The 
ery much), M (Much), L (Little), or 
N (Not at all). The test items were given to over 800 subjects, and 
items were retained if they showed a significant sex difference on at 
least two of the four possible responses. Terman and his associates 
decided to retain 195 items and to divide them between the two 
forms of the test. , 

Interests, For this part of the test 456 items were assembled, most 
of them coming from the Strong Vocational Interest Test. The test 

4 1 items were retained if they showed a 


was given to 245 subjects, anc 
sex difference on two of the three responses L, I, and D, if they were 


“probably” significant, and if they were in the same direction for all 
subject groups tested, 7.2. seventh graders, high-school students, 
college students, and adults. Of these, 170 items were retained and, 
along with 60 new items on historical characters and 40 items on 
preferences for contrasting activities, were given to new subject 
groups. For the final forms 187 items were kept. 

Opinions. Ninety-six items were prepared and given to 100 boys 
and 100 girls in the seventh grade and high school and to 50 men 
and 50 women in college- Twenty-eight items yielded significant 
differences in each of the three subject populations and were divided 
14 items each for the alternate forms. 

Cady’s revision of the Woodworth Personal 
00 children in Terman’s gifted group and 
to 100 controls. The responses of the sexes were compared and the 
item was retained if it yielded a critical ratio of 2.0 or more for either 
the “Yes” or “No” response- A series of 47 items from the Laird 


C-2 and Heidbreder introvers a schedules was tried 


g- gumchewing 
tion “tended to provok 
subject could answer VM (V 


words (e.g., 


Into two groups of 
Tntrovertive Responses: 
Data Sheet was given tO 1 


jon-extroversio! 
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out on seventh graders and high-school students. For the final scales 
84 items were selected. These had to be significant on both the 
“Yes” and “No” responses. 

Reliability and Validity. In most of the sections of the M-F test 
weighted and unit-scoring procedures were tried. It was found that 
the weighted scores added very little in the way of reliability or 
subtracted very little in the amount of sex overlap, so they were 
discarded in favor of unit scores throughout. 

Reliability. Terman and Miles are meticulous in giving data con- 
cerning the reliabilities of many of the preliminary and trial test 
forms. It will be sufficient for our purposes here, however, if we take 
note only of the reliabilities of the final forms of each part of the 
test. They are given in Table 60. 


Taste 60. Spearman-Brown Reliability Coefficients for the M-F Test* 


—— Single | Both 
Exercise 

sex sexes 
T. Word: association, ernia 40 62 
2. Ink-blot association 25 34 
3. Information «| 350. 68 
4. Emotional and ethical attitudes............ 89 .90 
ge Umterests. aserre rnrn -60 .80 
6. Opinions. . ae 54 64 
7. Introvertive responses... 24 .32 
e E N i E PE T Pay 90 
Either form (Spearman-Brown) .78 92 
Both forms (A and B) 88 96 

| 


* From Terman, L. M., and Miles, C. C. Sex and Personality. New York: McGraw-Hill Book 
Company, Inc., 1936. 


Most of the part reliabilities are low. The only parts that have 
reasonably satisfactory reliabilities are the emotional and ethical 


attitudes and the test of interests. The remaining reliabilities can be 


called into question but particularly those for the ink-blot and 
introvertive sections. 


The reliability of the entire test, of either form or of both forms 


combined, is satisfactory. Therefore we may, with some degree of 


assurance, rely upon the total score, even though we must view with 
considerable suspicion the value of some of the part scores. For this 
reason Terman and Miles repeatedly warn against profile-type 


. 
i 
W 
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analyses and suggest that until such time as the reliabilities of the 
parts can be increased, investigators confide only in the total score. 


Interrelation of Parts. The correlations among the scores on the 


different parts of the M-F test are given in Table 61. As the M-F 


Tape 61. [ntercorrelations among the Parts of the M-F Test* 


| Emo- 
| tional | Intro- 
A J Ink-blot Infor- | and Inter- | Opin- | vertive 
Exercise associ- | nation | ethical ests ions Te- 
ation atti- | sponses 
tudes 
Word association... eee eee .05 lt —.15 .07 —.02 | —.04 
Ink-blot association... - » - ar -06 2 Bs = 08 
Information... - 2 o Hi A 
Emotional and ethical attitude “le -09 p 
a E 00008 e m 
Opinions. . ag 


* From Terman, L. M., and Miles, C. C. Sex and Personality. New York: McGraw-Hill Book 
Company, Inc., 1936. 

sed to mirror the differences between men and women 
al culture, it would lose seriously in value if 
number and variety of these differences. 
ly important that a wide variety of psy- 


d by the test. The extent to which the 
M-F test does tap different functions can, to some degree, be judged 
by the intercorrelations among part scores. If we should find that the 
he test are all highly intercorrelated, we might 
have some reason to su t an insufficient variety of functions 
were tested. On the other hand, if we find low intercorrelations, we » 
can infer that the test taps at least as many separate functions as 
there are parts to the test. The ainan mag the parts i 
low, so we can conclude that grom areas 0: ages jd ee : 
We must temper this conclusion, however, with mg oe ed J o 
the low part reliabilities and with our awareness that these low 


reliabilities undoubtedly contribute to the low interpart a 
Validity. The validity of the M-F test is easy Parei e 
we restate its purpose: to differentiate between f mee i k 
have to davta-gee tf boye and men get different scores r gitls p 

women. This is almost universally the case. The range of scores tor 


test is suppo 
in our present Occident 
it did not reflect a large 
Therefore it is particular 
chological functions be tappe 


various parts of t ‘ 
spect tha 
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men runs from approximately 200 to —100, while that for women 
runs from approximately 100 to —200. The average score for men 
is 52, and that for women is —70. The amount of overlap in the 
distribution of scores for the two sexes is given in Table 62. 


Tape 62. Overlap in the Sex Distributions on the M-F Test* 


Exercise Percentage Overlap 

1. Word association x 18 56 
2. Ink-blot association... ... ss ene è BOE 
3. Information....... ee 16.97 
4. Emotional and ethical attitudes... -+--+ 28 31 
5, Interests. o s o x ete eee 
6. Opinions ee ee eed Apini SDP 
7. Introvertive responses. ..- as. 30.07 

Total. 8 02 


* From Terman, L. M., and Miles, A C Sew and Personality New York: McGraw-Hill 
Book Company, Inc., 1936. . 

According to these data we can conclude that the M-F test as & 
whole validly distinguishes between the two sexes. We can see that 
Exercise 5 (interests) is almost as valid as the entire test and that 
Exercise 2 (ink-blot association), Exercise 6 (opinions), and Exer- 
cise 7 (introvertive responses) are the least valid parts of the test. 

However, an alternative explanation would be to the effect that 
men and women do actually differ more in their interests and less in 
their responses to ink blots and introvertive items. There is really 
no wholly objective way of deciding between the alternative con- 
clusions: that some parts of the test are more valid than others oF 
that in the functions which to a great extent overlap men and women 
are really less dissimilar than in some of the other sections of the 
test. 

With our discussion of the Terman-Miles M-F test we bring to a 
close our chapter on unidimensional approaches to the measurement 
of personality. There are many other tests that we could, with profits 
discuss, but we have selected and discussed those that illustrate # 
method, point a moral, or, as we said once before, have led to some 
later development. We shall now find it profieablerto turn our atten- 
tion to these later developments. 


7 


ULTIDIMENSIONAL 


PERSONALITY: M 


APPROACHES 


Each of the tests we discussed in Chap. 6 was designed to yield one 
score, In this chapter we propose to discuss tests designed to yield 
several scores. These tests are examples of what we call multidimen- 


sional approaches to the measurement of personality. A multi- 


dimensional approach may consist in the simultaneous use of several 


tests of the unidimensiona 
same set of items scored in different w 


gradations between these extremes. 


| type, or it may consist in the use of the 
ays. And there are, of course, 


THE PERSONALITY INVENTORY 


We shall begin our discussion of multidimensional approaches by 
describing the Bernreuter Personality Inventory. This test consists 
of 125 questions to which a subject is asked to answer “Yes,” “No,” 
or “?.” It was developed by Robert G. Bernreuter and was first 
published in 1932. It was designed to do the work of four of the tests 
We discussed in the preceding chapter and, consequently, yields a 
series of scores serving the same purposes as the scores on these 
Original tests. These scores are supposed to measure neurotic tend- 
ency, self-sufficiency, introversion-extroversion, and dominance- 
submission, Individuals scoring high and low on each of these vari- 
ables can, according to Bernreuter, be characterized as follows: 


High B1 N. The individuals that score high on this scale show a tendency toward 
a neurotic condition. Such an individual ame, 
and is troubled by useless thoughts, by shyness, 
feels shut off from other people, he freque? ly daydreams, an 
things that have happened and over things that may happen. S . 

Low B1 N. The individual who scores low on the B1 N scale is an emotionally 
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often feels miserable, is sensitive to bl 
and by feclings of inferiority. He 
d worries both over 
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stable person. He is rarely troubled by moods, by worries, or by the criticism of 
others. He is self-confident, and is a doer rather than a daydreamer. 

High B2 S. The individual who scores high on the B2 S scale is a self-sufficient 
person. He is able to be contented when by himself. He prefers to work alone and 
depends upon his own judgment in reaching decisions and in formulating plans. 

Low B2 S. The individual who scores low on the B2 S scale is dependent upon 
others for his enjoyments. He likes to be with other people a great deal and prefers 
company both while working and during leisure hours. He prefers to talk problems 
over with others and to receive advice before reaching decisions. 

High B3 I. The individual who scores high on the B3 I scale is introverted in the 
sense that he is introspective and is given to autistic thinking. He shows the symp- 
toms of a neurotic condition which are typical of those individuals who score high 
on the B1 N scale. 

Low B3 I. The individual who scores low on the B3 I scale is extroverted in the 
sense that he rarely substitutes day dreaming for action. He is emotionally stable 
and possesses the characteristics of those individuals who score low on the BIN 
scale. 

High B4 D. The individual who scores high on the B4 D scale is dominant in 
face-to-face situations with his equals. He is self-confident and aggressive, and 
readily assumes a position in the foreground at social functions. He converses read- 
ily with strangers or with prominent people and suffers no feelings of inferiority 
when doing so. 

Low B4 D. The individual who scores low on the B4 D scale is submissive i? 
face-to-face situations with his equals. He lacks self-confidence, keeps in the back- 
ground at social functions, and rarely takes the initiative in directing people oF 
activities. He experiences feelings of inferiority and is reluctant to meet important 
personages. 


Development of Test. Personality tests prior to the advent of the 
Bernreuter Personality Inventory were constructed upon the basis 
of the proposition that a given behavioral element reflected, O” 
could be explained by, one trait. Not all psychologists accepted this 
hypothesis, but those concerned with the construction of personality 
tests used no other alternative. It remained for Bernreuter to t# eS 
the all-important step and to proceed to test the assumption that the 
“behavior of an individual in a single situation may be symptomat’ 
of several traits... . ” If this proposition could be established 
Bernreuter reasoned, an item could be assigned one diagnostic weigh 
for one trait and a different diagnostic weight for another trait. This 
would make possible the “construction of tests .. . which could 
be used in the simultaneous analysis of several traits... - ” 

To test this hypothesis Bernreuter assembled the items which had 
been used in four of the tests we described in Chap. 6 and proce® z 
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to demonstrate that a common core of these items could measure 
each of the original variables just about as accurately as the original 
tests. The tests Bernreuter selected as a basis for his operations were 
Laird’s Introversion-Extroversion Schedule, the Allports’ Ascend- 
ance-Submission Reaction Study, the Thurstones’ „Personality 
Schedule, and his own test of Self-sufficiency. 

„Subjects. Bernreuter initiated his work by giving a trial form of 
his inventory and the four tests from which its items were taken to 
several groups of students. These students were located at Menlo 
(Menlo Park, California) Junior College, Chico (Chico, California) 
State Teachers College, San Francisco State Teachers College, and 
Stanford University. Approximately 400 students were tested. The 
exact number for each test is given in Table 63. 


Taste 63. Number of Subjects Used by Bernreuter in Developing the Personality 


Inventory* 

Test | Men | Women 
ee Se 
Laird’s Introversion-Extroversion . 202 
Allports" ‘Ascendance-Submission. . 174 
"Thurstones’ Personality Schedule. 205 

244 


Bernreuter’s Self-sufficiency. +--+ +++ 


* pr : n 
From Bernreuter, R. G. The theory and construction of the personality inventory. J. 


soc. Psychol., 1933, 4, 387-405. 


Bernreuter prepared a distribution of scores for each of these tests, 
and upon the basis of these distributions he selected a number of 
criterion groups to represent extreme deviants on each of the tests. 
He selected 50 cases to represent “Introverts” and 50 cases to 
represent ‘‘extroverts.” He selected 50 cases to represent “ domi- 


nant” individuals and 50 cases to represent “submissive” individ- 
uals, and so on. There were included in each criterion group 25 men 
and 25 women. For each trait these were the 25 highest and 25 lowest 
Scoring men and the 25 highest and 25 lowest scoring women. 


, Item Analysis. Bernreuter then proceeded to find out how well the 
items in his inventory could differentiate between the two contrast- 
Ing criterion groups on each of the four variables. To do this he 
computed the number and percentage of each criterion group that 


answered “Yes,” “No,” and “ ?” to each question. 
Bernreuter’s next step should have consisted of the computation 
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of the difference between the percentages of each contrasting pair of 
criterion groups and of the assignment of diagnostic weights upon 
the basis of these differences. Bernreuter did not have to make these 
computations, however. He utilized one of Dr. Strong's item-weight- 
ing charts which makes these intermediate computations unneces- 
sary. Dr. Strong’s chart gives item weights directly from the 
percentages characterizing the criterion groups to be contrasted 
with each other. So when Bernreuter wished to determine the signif- 
icance of a “Yes” answer, he entered in the chart the percentages 
of the two criterion groups answering “Yes.” When he wished to 
determine the significance of a “No” answer, he entered in the chart 
the percentages of the two criterion groups answering “No.” And 
when he wished to determine the significance of a “<2” answer, he 
entered in the chart the percentages of the two criterion groups 
answering with a “ ?.” 

The weights resulting from this procedure ranged from 0 to +30. 
Now, wondered Bernreuter, what standard of elimination should be 
used? Obviously, responses with diagnostic values of 0 should be 
ignored. But should responses with diagnostic values of +1, +2, of 
+3 be ignored? To answer this question, Bernreuter tried several 
successive eliminations (additions, we should say) of items in at- 
tempting to develop a scale for self-sufficiency. First, he used @ 
responses with a diagnostic value of +7 or more. He weighted these 
responses equally (.¢., +1) and computed the reliability of tota 
scores. It turned out to be .73. Second, he added all responses having 
a diagnostic value of +6, and recomputed the reliability. ‘Then he 
added all responses having a diagnostic value of +5, and recom 
puted the reliability, and then added all responses having a diagnos- 
tic value of +4 and, again, recomputed the reliability. Each tim® 
it increased over the previous value and was then .87. Next, Bern- 
reuter added all items having a diagnostic value of +3, but foun 
that this did not further increase reliability. Therefore, conclude 
Bernreuter, responses with diagnostic values of 0, +1, +2, and £ 
should not be used. 

The elimination of responses with diagnostic values of £3, 0 
less, still left a range of values extending from +4 to + 30. This serie’ 
of values proved too cumbersome to retain in the actual scorin 
process, so at this point Bernreuter substituted a reduced buf 
proportional series of weights ranging from 0 to +7. When thes? 
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new weights were used in the Self-sufficiency Scale, a reliability of 
.92 was obtained. The development of the remaining scales followed 
the pattern established by the Self-sufficiency Scale. Responses with 
diagnostic values of 0, +1, +2, and +3 were eliminated, and new 
weights ranging from +1 to +7 were assigned to the responses 
retained. 

Reliability. Bernreuter then gave his inventory to new groups of 
subjects and obtained the split-half reliability coefficients reported 


in Table 64. 


Tague 64. Reliability Data for the Personality Inventory* 


| Men | Women 
Scale Average B ry rea a) Ce | 
| High | Col- Adult | High | Col- Addi 
| school | lege | school | lege 
| | | 
= — SS ess me 
BIN 87 | .88 | 90 .89 85 84 86 
B2 S .83 78 84 83 85 „84 84 
B3 I .85 .87 | -88 91 82 .83 80 
B+ D 88 .87 88 88 87 | .89 91 


Pue F -P 
From Bernreuter, R. G. The theory and construction of the personality inventory. J. soc. 


Psychol., 1933, 4, 387-405. 

Validity. Bernreuter’s thesis in constructing the Personality 
Inventory was that one set of items could be weighted differentially 
to do the work previously done by four separate tests. The data in 
Table 65 show the extent to which Bernreuter was able to verify 
this hypothesis. These figures show that the scales on the Personality 


Scores on the Personality Inventory and Those 
esigned to Replace* 


Tanie 65. Correlations between the 
of the Tests It Was D 


Fall quarter students Winter quarter students 


Bernreuter scale | 


Number r | re | Number | r fe 
—— Sel a Ge 
70 94 | 1.00 | 32 91 99 
70 89 1.00 46 86 1.00 
70 16 99 d4 69 2 
55 ‘st | 1.00 | 29 67 | 2 
ar 55 82 99 


From Bernreuter, R. G. The theory and construction of the personality inventory. J. soc. 


Psychol., 1933, 4, 387-405. 
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Inventory are highly correlated with the scores on the tests they 
were designed to replace. We must agree with Bernreuter that one 
set of items can be weighted differentially and can serve several 
purposes at one time.. ; 

But, granting this point, we can still ask, “Are the scales on the 
Personality Inventory valid for measuring neurotic tendency, self- 
sufficiency, introversion-extroversion, and dominance-submission ? 
Bernreuter offers no direct evidence in answer to this question. All 
that Bernreuter can claim is that if the original tests measure these 
traits in a valid way, then the Personality Inventory does so also. 
But if the original tests do not measure these traits in a valid way, 
then neither does the Personality Inventory. 

In Chap. 6 we discussed the validation of these original tests and 
found that, at best, it must be considered meager. Laird’s Introver- 
sion-Extroversion Schedule and the Thurstones’ Personality Inven- 
tory went through no validation process whatsoever. The Allports 
Ascendance-Submission Reaction Study and Bernreuter’s Self- 
sufficiency Scale were validated against various sets of ratings, but 
the validity coefficients obtained left much to be desired. s 

Criticisms. Three popular criticisms of the Bernreuter Personality 
Inventory are that the responses it elicits are due to chance, that 
they are slanted in a direction to win social approval, and that they 
are actually dishonest. Bernreuter agrees that these possibilities 
exist, but he feels that they do not vitiate completely, as his critics 
would have it, the value of the inventory scores. A 

Chance. Bernreuter cites the standard errors of estimate 1 
Table 66 as evidence that chance alone cannot account for the 


* 
TABLE 66. Standard Errors of Estimate for the Scores on the Personality Ii nventory 


Men Women 
Bernreuter 

scale High Col- High Col- 

school lege’ nen school lege Adult 
BIN H 18.2 16.9 20.8 21.5 20.4 
B2S 16.0 14.8 14.5 14.5 14.8 15.0 
B3 I 11.4 12.2 10.0 3.8 12.8 14.0 
B4 D 13.7 15:5 14.2 14.9 14.0 13.6 


* From Bernreuter, R. G. The validity of the personality inventory. Person. F. 1933, 1, 
383-386. 


Personality: Multidimensional Approaches 179 


variation in scores. He arrives at this conclusion by noting that 
these standard errors of estimate, computed according to the formula 
Fo = Ca VI — 72, are considerably smaller than the standard 
deviations of the original raw-score distributions. If chance were 
the only factor accounting for the variation in test scores, the 
Standard errors of estimate would be equal to the original distribu- 
tion standard deviations. 

The formula for the standard error of estimate shows that if the 
reliability of a test is 1.00, the standard error of a single score will be 
0; that is, if a test is perfectly reliable, there will be no error. On the 
other hand, if a test possesses no reliability, the standard error of a 
single score will be equal to the standard deviation of the distribu- 
tion, and the test will not reliably distinguish one individual from 


another. In other words, a person may secure one score today and a 
ated score tomorrow. When the reliability of a 


completely unrel 
f estimate formula shows that the 


test is .866, our standard error o. at t 
standard error of a single score is one-half the standard deviation 


of the distribution. Therefore a reliability of .866 enables us to 
reduce by 50 per cent our error in locating the score of a single 
individual. 

The standard errors of estimate in Table 66 average only four- 
tenths of the original raw-score standard deviations. Therefore our 
error in locating the score for a single individual is 60 per cent less 
than if chance were the only factor causing score variation. There- 
fore, says Bernreuter, chance does not account for the scores on the 


Personality Inventory. 
_ We can agree that thi 
important explanatory 
factors operating are tl 
Scores be due, we wonder, 
to win social approval or cou 


s is true. But the elimination of chance as an 
factor does not mean that the non-chance 
hose we choose to have operating. Could the 
to a slanting of responses in a direction 
ld they be downright dishonest? 


Social Approval. Bernreuter, to check upon the extent to which 
responses might be slanted to win social approval, asked a group of 
Students to take the Personality Inventory under two different 
conditions. The first of these conditions was the standard one, and 
the second consisted of asking students to respond to the test 1tems 
1n such a way as to win the greatest possible degree of social approval. 
Bernreuter found the scores under these two conditions practically 
uncorrelated. The correlation between the two series of “neurotic” 
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scores was —.07, that between the two series of “self-sufficiency” 
scores was .19, and that between the two sets of “dominance” 
scores was —-03. Under the second set of conditions students indi- 
cated that social approval would go to emotionally stable, self- 
sufficient, and dominant individuals. But the first answers of many 
of the students who replied thus indicated emotional instability, 
lack of self-sufficiency, and submissiveness. This result, argues 
Bernreuter, shows that emotionally unstable individuals, individuals 
lacking in self-sufficiency, and submissive individuals will, in many 
cases, give responses that are not controlled by the desire to win 
social approval. 

Dishonesty. And now the last question. Is it possible that a subject 
will respond to the items in the Personality Inventory by indicating 
the kind of person he would like to be rather than the kind he ac- 
tually is? Bernreuter, to check upon this possibility, asked each 
one of several students who had taken the test under standard 
conditions to take it a second time and to indicate by his second 
responses the kind of a person he would most like to be. The scores 
secured under these two sets of conditions were compared with each 
other. Bernreuter found a correlation of .22 between the two series 
of “neurotic” scores, a correlation of .39 between the two series O 
“self-sufficiency” scores, and a correlation of .14 between the two 
series of “dominance” scores. These correlations are sufficiently 
low, argues Bernreuter, to show that most subjects do not, under 
standard conditions, indicate what they want to be like rather than 
what they actually are like. 

We can grant, with Bernreuter, that chance does not account for 
all test-score variation and that many subjects will give honest 
responses. But Bernreuter has not demonstrated that dishonest 
answers can be dismissed as of infrequent occurrence. It would be 
desirable, therefore, if some method could be devised to indicate 
how honest or how dishonest any given set of responses is likely t° 
be. Floyd L. Ruch has attacked this problem and reports on what he 
calls an honesty scale for the Personality Inventory. This scale, 
applied to the responses secured under standard conditions, 
is supposed to indicate the degree of honesty or dishonesty 
involved. 

Ruch developed his honesty scale by comparing the responses of 
245 subjects acting under “honest” and under “dishonest” cond 
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tions. “Honest” conditions were the normal ones. “Dishonest” 
conditions were induced with these instructions: 


a position as salesman. Your showing in this 


Imagine that you are applying for 
You know the characteristics of a 


test will decide whether or not you get the job. 
good salesman. See if you can answer these questions as 
whether you really feel that way or not. 


a good salesman would, 


“honest” blanks and the “dishonest” 
and found, as he expected, that the 
results were quite different. On the first set of blanks, ż.e., on those 
completed “honestly,” the median score corresponded to the 50th 
percentile for male college students. But on the second set of blanks, 
i.e., on those completed “dishonestly,” the median score corre- 
sponded to the 98th percentile (for extroversion) for male college 


See 2D UNa” ie a8 
students. Ruch tabulated the number of “Yes,” “No, and “? 
he “honest” and “dishonest” 


, and assigned scoring weights 


Ruch scored both the 
blanks on the introversion scale 


answers for each question under t 
conditions, computed the differences. 


in accord with these differences. ; 
Then he had a second group of 100 students take the Personality 


Inventory, first under “honest” and then under “dishonest” condi- 
. > A . . 
tions. Then he scored both sets of papers for introversion and for 


honesty. He divided the students, upon the basis of their introver- 
sion scores, into those more introverted than the average college 
> 


male and into those less introverted than the average college male. 
The honesty scores for these two groups of subjects are given In 


Table 67. 


Under “honest” conditions honesty scores range from 15 to 60 
and have a median value of 39. But under “dishonest” conditions 
they range from 5 to 35 and have a median value of 14. Introverts 
changed their responses and their honesty scores to a remarkable 
extent. There is a clean-cut separation between their honesty scores 
in the “honest” and in the “dishonest” situations. If we can assume, 
with Ruch, that all introverts were originally honest, scores from 
35 and up can be taken as indicative of honesty, and scores from 34 
and down can be taken as indicative of dishonesty. However, we 
must note that if we adopt 35 asa critical score separating honest 
from dishonest answers, more than 50 per cent of the extroverts 
must be considered to have answered dishonestly even under hon- 


est” conditions. If this is not an artifact of the experiment, the data 
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Taste 67. Honesty Scores of Introverts and Extroverts under “Honest” and 
: “Dishonest” Conditions* 


Introverts Extroverts 
Honesty score |——— —|- Sa m 
| “Honest” |“ Dishonest” | “Honest” |“ Dishonest” 
60 1 
55 1 
50 5 1 
45 10 4 
40 16 11 
35 6 ac 11 
30 re 2 10 
25 5 ia 12 2 
20 a 3 11 4 
15 cr 17 1 19 
10 pe 14 og 31 
5 ais 3 w 5 
Totals Besdnsminen 39 39 61 61 


* From McNemar, Q., and Merrill, 


: M. A, (Eds.) Studies in Personality. New York: McGraw- 
Hill Book Company, Inc., 1942, 


lead to the conclusion that extroverts, as measured by the Per- 


sonality Inventory, are less honest than introverts, And this should 
have important implications for the mea 
Intercorrelations. The scales on the 


* From Bernreuter, R. G. The th O d i ity i 
"en ETETA cory and construction of the Personality inventory, F, soc. 
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factor-analysis approach, to reduce the number of scales on which 
the Bernreuter test needs to be scored. Therefore he analyzed the 
intercorrelations in Table 68 in accord with Hotelling’s method of 
principal components. And he found that two independent factors 
would “account for all but about four percent of the non-chance 
variance.” 

„One of these factors, says Flanagan, “may be interpreted as 
distinguishing between the self-confident, well-adjusted, socially- 
aggressive, ‘thick-skinned’ individual and the self-conscious, shy, 
emotionally unstable individual.” Eight of the items which indicate 
the character of this factor, which Flanagan calls self-confidence, are 


given in Table 69. 
Taste 69. [tems Related to Self-confidence* 


Do you blush very often? 
Do you feel self-conscious in the presence of superiors in the academic or business world? 


re you troubled with shyness? 


Are your feelings easily hurt? f 
Do you often find that you cannot make up your mind until the time for action is passed į 


Are you troubled with feelings of inferiority? N 
© you have difficulty in starting a conversation with a stranger? 
Ate you troubled with the idea that people on the street are watching you? 
* From Flanagan, J. C. Factor Analysis in the Study of Personality, Stanford University, 
Calif.: Stanford University Press, 1935. 
r é B ‘ ‘ 
The second factor is “best described,” says Flanagan, “as dif- 
. ; 5 ¥ R 
ferentiating between the social and the non-social or independent. 
Eight items which indicate the nature of this factor, which Flanagan 


calls sociability, are given in Table 70. 
Taste 70. Items Related to Sociability* 


Do athletics interest you more than intellectual affairs? j 
Do you think you could become so absorbed in creative work that you would not notice a lack 


Of intimate friends? 
© you prefer traveling with someone who will make all the necessary arrangements to the 
adventure of traveling alone? - 
ave books been more entertaining to you than companions? 
© you usually enj i evening alone? 

ally enjoy spending an á p f 3 
© you get as many as at the time of reading a book as you do from a discussion of it after- 
Wards? 
© you prefer making hurried decisions alone? 
© you like to be with people a great deal? 


* From Flanagan J. C. Factor Analysis in the Study of Personality. Stanford University, 
zb & 


alif,; Stanford University Press, 1935. 
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When Flanagan had isolated these factors and had reworked s 
data to give each of his subjects scores on these new factors, ie 
picked those with scores above one standard deviation and those 
with scores below one standard deviation as new criterion groups. 
Then he compared these groups with each other (two on each 
factor), and, using an item-analysis chart which he devised, he 
assigned scoring weights to the various responses. When the tests 
were rescored with these new item weights, Flanagan found values 
which correlated .98 and .84 with the original factor scores. Using 
the scores computed from the item weights, Flanagan repeated his 
entire item-analysis procedure and this time secured scores which 
correlated .98 and .91 with his original factor scores. Also, the 
correlation between these two series 


of scores was now nearly zero, 
whereas in the first trial it was .18. 


When Flanagan tried his keys 
out on new groups of cases, he found reliabilities of .86 and .78 and 
an interscale correlation of .O4. 

Now, wondered F] 
reuter’s original scale 
indeed, as the correl 


anagan, how well could the scores on Bern- 
s be predicted from these new scales? Very well, 
ations in Table 71 amply 


demonstrate. We can 
TABLE 71, Multiple Correlations be 


tween Flanagan's New Scales and the Original 
Bernreuter Scales* 
Neurotic tendency 


a E .97 
Self-sufficiency .87 
Introversion-extroversion , <95 
Dominance... vAn AEE 
* From Flanagan, J. C. Factor Analysis in the Study of Personality. Stanford University, 
Calif.: Stanford University Press, 1935. 


agree with Flanagan “th 
contain such a larg 
isnot . . . worth 
factors.” But the ing the Bernreuter test 
for two independ a or four substantially 
correlated variable gment or to change the degree 
of validity which May not possess, 


THE PERSONAL AUDIT 


Our second example of a multidimensional a 


urement of personality is contained in the P 
devised by Clifford R. Adams and Willi 


Pproach to the meas- 
ersonal Audit, a test 
am L. Lepley. It is available 
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in two forms, LL and SS. Form LL consists of nine parts of 50 ttems 
each, and Form SS consists of the first six parts of Form LL. The 
parts of the test are relatively independent of each other and were 
designed to assist in “the measurement of those traits of personality 
essential to the vocation for which an individual is preparing.” 

According to Adams, these traits are seriousness-impulsiveness, 
firmness-indecision, tranquillity-irritability, frankness-evasion, sta- 
bility-instability, tolerance-intolerance, steadiness-emotionality, per- 
sistence-fluctuation, and contentment-worry. The scores on these 
traits are to be interpreted, says Adams, as follows: 


eness. High scores indicate a serious disposition character- 


1. Seriousness-Impulsiv 
, and studiousness. Low scores indicate pronounced 


ized by quietness, ambition 
sociability (or the need for it), aggressiveness, and dominance. 

2. Firmness-Indecision. High scores indicate positiveness and conscientiousness. 
The individual tends to be cooperative, poised and confident. Low scores indicate 
a tendency to accept momentarily and impulsively suggestions of others. Frequently 


this leads to an inability to m 

3. Tranquillity-Irritability. 
lack of irritability. There is little 
“OW scores indicate readiness anc 
annoyance and fault-finding toward subordinates. A 

4. Frankness-Evasion. High scores indicate dependability, frankness, and truth- 
fulness, Low scores indicate unwillingness to face reality and inability to take 


responsibilities. 
5. Stability-Instability. High score 
willingness to carry responsibilities. 


accompanied by feelings of inferiority. ; : 
6. Tolerance-Intolerance. High scores indicate broadminded, easygoing attitudes. 


Standards and ideals tend to be flexible, practical and realistic. Low scores indicate 
Strong attitudes, usually unfavorable, toward others. Intolerance and prejudice, 
Often disguised as high standards and ideals may be present. 

4. Steadiness-Emotionality. High scores indicate normal ways of thinking. Feel- 
ings are not intense. Low scores indicate that the individual is atypical. Usually 
Sensitive, feelings are volatile and deep-seated. ; s 

8. Persistence-Fluctuation. High scores indicate stable attitudes and interests 
With little likelihood of pronounced changes occurring after age 25. Low scores 
indicate that interests and attitudes are in a state of flux. 

9. Contentment-Worry. High scores indicate few unsolved problems and absence 
of Worry about them if they do exist. The person is usually stable, cooperative, and 
well adjusted to his work and social life. Low scores indicate worry, uneasiness, and 
indecision brought about by unsolved problems. Lacking confidence, the individual 
'S usually uncertain and beset by conflicts often revolving around adjustments to 


the Opposite sex. 


ake or maintain a decision. 

High scores indicate evenness of temperament and 
tendency to fly off the handle or become impatient. 
| unevenness of response, often accompanied by 


s indicate pronounced confidence in self and 
Low scores indicate a lack of self-confidence 
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Five parts of the Personal Audit require a response of “much,” 
“some,” “little,” or “no.” These parts are I, III, V, VI, and IX. 
Part I requires a subject to indicate whether he has much, some, 
little, or no /iking for various activities. Part II] requires a subject to 
indicate whether he has much, some, little, or no dislike for annoy- 
ances. Part V requires a subject to indicate whether he has much, 
some, little, or no fear in reaction to various possible events. Part 
VI requires the subject to indicate whether he has much, some, 
little, or no dislike for activities not already given in Part I. And 
Part IX requires a subject to indicate whether he has done much, 
some, little, or no thinking on various topics. 

Part II requires a subject to show whether he 
reservations, or disagrees with different statements. Part [V requires 
a subject to indicate whether he believes cert 


ain statements to be 
true, doubtful, or false. Part VII is a word-association test which 


requires the subject to indicate which one of four response words 
best goes with the stimulus word. Finally, Part VIII requires the sub- 
Ject to indicate if his feelings about a variety of subjects are the 


same, partly different, or different from what they were three or four 
years ago. 


agfees, agrees with 


; z IE orms, we begin our discussion 
with the third edition. This editi $ E g 


tion consisted of ten parts, each 


ribe the sources of 
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not described for us the exact procedures by which he assigned items 
to the various scales. We are left in doubt, therefore, as to whether ` 
he was just fortunate in achieving these results or whether he went 
through some of the same processes used by Kuder in the develop- 
ment of his Preference Records (see Chap. 3). 

In contemplation of the fourth revision of the Personal Audit, 
Adams and Lepley gave the third form to 356 college students. 
Using a priori scoring keys as before, they scored these tests and 
determined which students fell into the upper and lower quarters 
of each trait distribution. Next, they determined the mean rating 
assigned to each item by the appropriate contrasting criterion 
groups, computed the differences between these mean ratings, and 
determined the significances of these differences by computing a 
Series of ¢ ratios. Then, says Adams, items with ¢ values less than 3 
were eliminated. These processes resulted in the elimination of one 
entire part of the test and, of course, reduced the number of items 
available for the nine parts of the test which survived. To remedy 
this, Adams and Lepley collected new items and finally achieved a 
total of 60 items for each of the nine parts of the fourth revision of 
the Personal Audit. g 

The fourth revision of the Personal Audit was administered to 400 
college students. Again, Adams and Lepley selected the students 
falling into the upper and lower fourths of each trait distribution 
and, again, went through a complete item analysis. This time, how- 
Ever, each item was related to the total scores on each scale which 
Correlated .30 or more with the scale of which it formed a constituent 
Part. This made it possible for Adams and Lepley to eliminate items 
with the lowest ¢ values within a scale avd items with the highest ¢ 
Values in relation to other scales. Adams and Lepley eliminated 90 
items, 10 from each scale, leaving 50 items in each of the nine parts 
Of the test. These items constitute the fifth and currently available 
revision of the Personal Audit. e , ee 

A final step consisted of giving the fifth revision to 231 high-schoo 
boys and to 230 high-school girls. The 50 highest scoring boys and 

0 lowest scoring boys, and the 50 highest scoring girls apr lowest 
Scoring girls were isolated as criterion groups, and a t = a 
analysis was performed. This time only three items W na m o 
have a+ value less than 3, so no further revision was, or has been, 


attempted. 
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Validity. Adams discusses six sets of “validation” data. We can- 
not agree that all the data which Adams discusses are relevant to the 
validity problem, but we shall find it worth while to review Adams 3 
arguments. The six types of data relate to the internal consistency 
of the trait scores, the intercorrelations among the scales, con- 
sensus of opinion, ratings, correlations with other tests, and clinical 
evaluations. 

Internal Consistency. We see the criterion of internal consistency 
applied with a vengeance in the development of the Personal Audit 
test. This was, in fact, the only criterion used, and it was used 
repeatedly. As we have already indicated, all except three items have 
t values of 3 or more, indicating significant correlations with total 
scores. We can agree with Adams that his data show that each scale 
is internally consistent, but this certainly does not show that they 
are valid. 

Intercorrelations. The intercorrelations among the scales in the 


Personal Audit are shown in Table 72. These correlations are based 


upon the records of 442 college students not included in 
standardization 


and have an ave 


any of the 
groups. The correlations range from —.07 to .56 
rage algebraic value of .12. Adams offers these data 
& 


ABLE 72. Intercorrelations among the Scales on the Personal Audit* 


£ D È g 
2 = o | = S 
E z 5 dh g “a E- Pa 
Bate Pl\elE;el2]2l2)2) 2 
412/21 ¢/ 8) 21 €1 2) 8 
| 4|] %2] 3] 2] 7| 
3] #2) &]£ a | 2] 2 8 
=) El] es | |) 3| 2/2 
—— Aiele| = 3| 8/8] 6 
es Le Bjla}]e |e 
Seriousness-impulsiveness 90 0s tt — hI 
y [= E i= 2 
Firmness-indecision, , sil fe | = 00 | .04 |— .03|— .07 
Tranquillity-irritability... | ‘91 15 24 =:03|— 02 4 
Frankness-evasion,.... ‘ š s 45 | 56 -03| -05 5 
Stability-instability... 0. |, 90; 18) .17 | .08} .03| -1 
Tolerance-intolerance,...,. 96 | 33 04 .03| .30 
Steadiness-emotionality | -95 |—.07} .04) .30 
Persistency-fluctuation. , . 91| .18) .05 
Contentment-worry eet ee 93 of 
“a “92 
* From Adams, C. R, Manual of Direction 
Chicago: Science Research 


Associates, 1945, wad Using and Interpreting the Personal Audit. 
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as “evidence for validation by negation. . . . ” He says that “since 
the nine parts do wot overlap one another to any appreciable ex- 
tent . . . nine relatively independent factors are measured.” We 
can grant the last part of this statement to the effect that the scales 
are relatively independent, but we cannot agree that this fact has 
any relevance to the validity problem. To assert that a scale is valid 
Means that it must do a specific measuring job with some minimum 
degree of accuracy. Proving that scales are independent of each 
other in no way shows that they provide accurate measures of 
personality traits which are said to be involved. 

The diagonal entries in Table 72 represent reliability coefficients. 
These were computed by the split-half technique with the aid of the 
Spearman-Brown Prophecy Formula. 

Consensus of Opinion. Adams asked 30 psychologists, caliber not 
specified, to tell him what each part of the Personal Audit appeared 
to measure. He says that their combined judgment coincides with 


the descriptions he himself has prepared. All that this indicates, we 
-sense agreement upon names for the several 


fear, is a common 
traits. But just because we can agree on the name for a trait is no 
proof that we can accurately measure it. Consensus on trait names 
are completely irrelevant to the fundamental validity problem. 
Ratings. Adams discusses five sets of rating data, but, unfor- 
tunately, he is vague on many of the details which we need in order 
to make an adequate evaluation of the procedures which he used. 
He cites, for example, a study by Mrs. C. R. Adams. She “asked 
nine teachers and four upper class students to identify the boys and 
girls in the four high school grades (N = 461: 231 boys, 230 girls) 
who represented extremes on 12 different personality traits. Al- 
though the differences found were small they tended,” says C. R. 
Adams, “to support the descriptions of Audit traits as tentatively 
given... , ” The number of students rated on each of the 12 traits 
varies from 17 to 23, but little other information of value can be 


gathered from the Adams’s account. 
The second set of rating data mentioned by Adams was secured 


by Reppert and Borow. About the only information which Adams 
i t € -i 3 
gives is that “correlations between Audit scores and ‘personality 

» 


ratings of 120 chemical operator tra 
Again, we are left in the dark as to the n 
cannot properly make any evaluation of them. 


inees were... small... . 
he nature of the ratings and so 
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The third set of data refers to two groups of “clerical employees 
in three government agencies.” In these groups, 50 employees oer 
selected as being “most unsatisfactory” in their jobs, and 30 as 
“most satisfactory” or “excellent.” Adams states that on Frank- 
ness-Evasion the satisfactory employees received a percentile aver- 
age of 54 and that the unsatisfactory employees received a percentile 
average of 39. Furthermore, Adams reports that the percentile 
scores for satisfactory employees ranged from 11 to 99 but that 
those for the unsatisfactory employees ranged from 3 to 60. 

We can accept with Adams the fact that the scores on Frankness- 
Evasion seem to differentiate the satisfactory employee from the 
unsatisfactory employee. But this hardly has any bearing upon the 
question as to how valid the scores in Part IV may be for the 
measurement of that which it is asserted to measure, namely, 
frankness-evasion. 

Adams’s fourth set of rating data is presented by Gilliard. He 
“compared Audit scores of 100 R.O.T.C. students judged by their 
officers to be leaders with 100 R.O.T.C. students judged not to be 
leaders. . . . The leaders tended to be more stable, more steady 
and less emotional, and more persistent and less cycloidal.” 

And, finally, Adams says that he has found “ happy husbands . « - 
high on Tranquillity, Frankness and Tolerance,” and “ happy wives 
- +. high on Frankness, Stability and Contentment.” Data for 
unhappy husbands and unhappy wives Adams fails to give, however. 


In no one of these five sets of rating data can we find much evi- 
dence strictly relevant to the problem of determining the validity 
of the scores on the Personal Audit. 


Where, for example, do we find 
any data showing that the traits measured by the Personal Audit 
are “essential to the vocation for which an individual is preparing” ? 
And, after all, Adams did state this as being the major purpose of 
the test. ae 


ersonal Audit scor Je might, of 
» make an exception in ore a 


Vocational 


Personality: Multidimensional Approaches 191 


Interest Test, for Strong has validated his scales against occupational 
groups of known composition. But Strong certainly has never 
claimed (nor has anyone else) that the Vocational Interest Test can 
serve as a basis for the validation of a personality test of the type 
represented in the Personal Audit. ‘ 

Clinical Evaluations. Adams offers as his last validating procedure 
a comparison between 100 maladjusted individuals and 100 adjusted 
individuals. Adams selected the 100 maladjusted individuals “from 
his clinical practice” and the controls from (presumably) routine 
college records. He says that comparisons between these two groups 


Justify (among others) the following conclusions: 


1. That seriously maladjusted students are characterized by extreme deviations 


from the means 

2. That fearful, anxious and 
Frankness and Stability 

3. That stubborn, aggressive 
Scores on Seriousness, Tranquillity, 
sistence and Contentment 

4. That cases characterized by 
scores on Frankness, Steadiness and F 
ness and Persistence 


depressed subjects make low scores on Firmness, 


and generally obnoxious individuals make low 
and Frankness; high scores on Stability, Per- 


lying, hallucinations and delusions make low 
irmness and Stability; high scores on Serious- 


give us any ground for being enthusiastic 


These few facts hardly g : 
about the validity of the scores on the Personal Audit. We must 
demnation of the utility of the 


consider the results as a striking con 
criterion of internal consistency as the sole standard of personality- 


test construction. 


THE MINNESOTA PERSONALITY SCALE 


Our third example ofa multidimensional approach to the measure- 
ment of personality is that contained in the Minnesota Personality 
Scale. This scale, constructed by John G. Darley and Walter J. 
McNamara, was designed by them to do in a more efficient manner 
the work done by a number of previously published tests. These 
tests were the Minnesota Scale for the Survey of Opinions (Rund- 


quist and Sletto), the Bell Adjustment Inventory, and the Minnesota 
Nventories of Soal Attitudes (Williamson and Darley). These 


tests, collectively, yield 13 scores. It was Darley and McNamara’s 
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purpose to reduce the number of these scores without sacrifice of 
useful information. f . 

The Minnesota Personality Scale consists of 218 items and is 
divided into five parts. Parts I and V contain impersonal statements 
which require a subject to check one of these five — 
“strongly agree,” “agree,” “undecided,” “disagree, or amer 
disagree.” Sample items from Part I read, “Life is just a series 0 
disappointments,” and “A high school education makes a man a 
better citizen.” Sample items from Part V read, “Most great for- 
tunes are made honestly,” and “Cooperative housing plans should 
be encouraged.” Parts II, IHI, and IV contain personal questions 
which require a subject to check one of these responses: “almost 
always,” “frequently,” “occasionally,” “rarely,” or “almost 
never.” Sample items from Part II read, “Are you eager to make 
new friends?” and “Do you dislike social affairs?’ Sample items 
from Part III read, “Do you become nervous at home?” and “Was 
your father your ideal of manhood?” Sample items from Part IV 
read, “Are your feelings easily hurt?” and “Are your eyes sensitive 
to light?” We see that the items in Parts I and V are phrased as 
statements and that items in Parts II, II], and IV are phrased as 
questions. 

The Minnesota Personality Scale yields five scores, one based on 
each part. Each item contributes to one score only and not to several, 
as in the case of the Bernreuter Personality Inventory. The traits 


measured by the Minnesota Personality Scale as defined by Darley 
and McNamara are as follows: 


Part I. Morale. High scores are indicative of belief in society’s institutions and 
future possibilities. Low scores usually indicate cynicism or lack of hope in the 
future. 

. Part II. Social Adjustment. High scores tend to be characteristic of the gregarious, 
socially mature individual in relations with other people. Low scores are character- 
istic of the socially inept or undersocialized individual. 

Part III. Family Relations. High scores usu 
parent-child relations. Low scores suggest confi 
child relations. 

Part IV. Emotionality. High scores are representative of emotionally stable and 
self-possessed individuals. Low scores may result from iF y Aay ver- 
reactive tendencies. g RASEI SENSE IES 


ally signify friendly and healthy 
icts or maladjustments in parent- 


Part V. Economic Conservatism. Hi 


gh scores indicate conservati i tii 
ative ic a 
tudes. Low scores reveal a tendency toward lib econom 


P : i eral or radical poi iew on 
current economic and industrial problems. points of v 
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The first step in the development of the Minnesota Personality 
Scale was that of giving to several hundred subjects the tests it was 
designed to replace. These tests, the Minnesota Scale for the Survey 
of Opinions, the Bell Adjustment Inventory, and the Minnesota 
Inventories of Social Attitudes were given in 1935 and again in 1936. 
They were given to 326 men and to 217 women at the University 
of Minnesota. From these records Darley and McNamara selected 
for intensive analysis 100 test and 100 retest records for men and 
100 test and 100 retest records for women. 


The second step in the development of the Minnesota Personality 
he test-retest correlations on each of the 


gether with their test-retest 
l) are given in Table 73. 


Scale was that of analyzing t 
13 test variables. These variables, to 
correlations (over a nine-month interva 


Tague 73. Test-Retest Correlations on 13 Personality Variables* 


Scale Men | Women 

65 63 
Inferiorit í -61 E] 
Attitude toward fami 64 76 
Attitude toward the leg: 55 257 
Economic conservatism 79 59 
Education... 46 63 
General adjustment. - - ++ bs a A l 64 
Home adjustment. +-+- co wal 71 82 
Health adjustment... ++ ++ 22 81 
Social adjustment. - - ca usara as .78 
Emotional adjustment. «++ 68 70 
Social preferences. «= t a B 62 

Ot 69 


Social behavior... -+-+ sae e oe | 


attitudes and adjustments. 7. soc. Psychol., 


*From Darley, J. G. Changes in measured 


1938, 9, 189-199, 
Darley and McNamara decided that these coefficients “revealed 


a reasonable degree of stability” and that because of this “‘reason- 
able degree of stability,” it would be desirable for them to proceed 


further with their plans- 

The third step in the development of the Minnesota Personality 
Scale was that of computing the intercorrelations among the 13 test 
variables. These correlations were computed separately for the test 
and retest scores and separately, also, for men and women. The 
results are presented in Tables 74 and 75. Table 74 shows the inter- 
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correlations for men, and Table 75 shows the intercorrelations for 
women. In both tables the figures above the diagonal are based 
upon the original test scores, and those below the diagonal are based 
upon the retest data. , 
The fourth step in the development of the Minnesota Personality 
Scale was that of performing centroid factor analyses upon the four 
intercorrelational matrices presented in Tables 74 and 75. 


TABLE 74. Intercorrelations among 13 Personality Variables (Men)*} 


z |e | 
s s 
5 |§ g e 
“| 5 |S ./3 Bee] 2 BH Els Sls Bl Els El gl£ 
TEIGEN 2 [22/2 35 2le 8/3 B/S] e 3 
5 Els =| § S| sale ole. 2 
$ 2) 5 [cele Sa 8) a SFe FE ag 3/5 318 2133 
B ft ft fe ja te le los la le a je ig 
A = — | a S — —- en) GRU 
1 -50| 30| 48] 19} 43] 74! 26] osl .29| 34| 20) .30 
2) 63 17) .24 | 12| .23) 45] 17 |— o6) sa| 28| .22| 48 
3 .42| .32 «37 |—.16) .32| 40) .40 |- 22| 14| 09| 29| .18 
4 -54| .30 | 38 -21| 40} .58/ 15 | os| 13| 28| o6) .14 
5 14} .18 |—.03| .29 -08| .39] 06 | o2) 17| 22| .20| .14 
6 -47| .23 | 24| 44 | gs -52| .10 |— .06|—.15| .01 |—.06|—.04 
7 -76| .57 | 38| .55 | 40] 54 25) 11) 28] 0] a .29 
8 -35| .35 | .60) .24 | .12/~.02| 25 32) 31) .48| 32] .18 
9 |=.01} 14 | 04) .08 |—o9|— 15] ‘99 28 .16| .38| .20) .13 
10 34) 60) 07) 05 | os) 11] 34l 96 Er «46| 43| .69 
ul 31 -55 | .22/ .26 | .osl— os} “91 “sg 41) s4 -20| .32 
12 BY) 38 2 12) ai] aiad aa “1g 03| .49| .23 49 
13 34) -61 | 18] 14) 06] id “291 “2p 16.77] 43. | .66 
* Correlations above 


the diagonal are based on original test data and those below the 
test data. 


J. G., and McNamara, W. J. Factor leigi h 
aean D CTA je AA ana > e$ f new 
J. educ. Psychol., 1940, 31, 321-334, analysis in the establishment of ne 


diagonal are based on re 
t From Darley, 
personality tests, 


the “problem was to 
at all would be accounted 


nment which appeared 
mara give as follows: 
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IV. Health adjustment, emotional adjustment 
V. Economic conservatism 


Darley and McNamara conclude “that the thirteen separate 


scores in the battery can be accounted for by five psychologically 
hat these factors [are] sufficiently stable 


meaningful factors, and t 
from test to retest to represent significant aspects of personality.” 
Tape 75. Intercorrelations among 13 Personality Variables (Women)*} 
g e-l 
S a 
E 1 A al «| al #158 
elt Ssa a] 8] 3 8] ses! $ a 
o| EIS IS REE E uE El. &/_ &|.9 5| 5| 2 
2/8 |35| g |ë glg 2/3 815 Sls B/S 2/3 = 
ES EE ERS El 3 Easa ae aE 313 Eg 
2 S| 2 EEZ BS S| a |ó TE ET ela ya ela ajh g 
a “ee le Vor fo Je ie ie le e a is is 
1 58 | .54 sel .86| .36| -04| .25 | .32 | -20 | .17 
2 64] |40 36| sol .32| .15] -47 | .42 | .29 | .50 
3 a7 | 4S 32| e| .58| 11| .23 | -21 | .02 | .15 
4 62 | .33 | .37 sıl .67| .27| -10| .09 | .18 | .16 | .02 
5 12 | .02 | .23 isl .38| .14| .07} .08 | .04 | -19 | .18 
6 56. | 19 | .19 58|—.02/—.08| .07 | .11 | .04 | .01 
7 | 80 | 152} 54 Sf 33.05] .19 | .28 | .13 | .15 
8 "96 | 33 | 71 06] .33 38] .23 | .50 | .11 | .20 
9 Te) i) at |S .02|—.03| -39 05 | .38 | .06 | .06 
10 |i | so] .20 |- 14| 08| .26} -17 «42 | .42 | .67 
11 "za | 50| .43 4| 33| -65| -53| -483 15 | .34 
a {30} 21.20 os| .14| -00| .06| .28 | .05 58 
13 El gal a os) .32| -23| -14| -72 | -40 | .40 
test data and those below the 


* Correlations above the diagonal are based on original 


dia are base ata. i 

eats base end MeNamar, W. J. Factor analysis in the establishment of new 
arley, J. G., 4 a $ 

Personality tests. pA educ. Psychol., 1940, 31, 321-334. 


The fifth step in the development of the Minnesota Personality 


Scale was that of converting the raw scores on each of the 13 original 
This done, these scores were appro- 


Variables i standard scores. l 
DE i d to give scores on the five factor 


priatel ighted and summe i 
Cah akies. Tieke scores were then intercorrelated with the results 


i i lations should 
shown in Table 76. ‘Theoretically, all these intercorre 
be ek aa not zero, but they are low enough to suggest that 


eren f per: i 7 T i tion. 

five diff sonalit are unde considera 

1 t aspects of p 5 s 
The sixth step in the development of the Minnesota Personality 


Scale was that of reducing the number of items needed to produce the 
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five factor scores. Darley and McNamara selected separately iar 
each factor and separately for men and women the 25 highess os 
subjects and the 25 lowest scoring subjects to serve - - 
groups. Then they determined how many subjects in can h of F 
groups gave each of the alternate answers to the various p 
Having done this, they applied the Rundquist-Sletto internal z 
sistency technique and found 302 items (out of 368) which yiek z 
critical ratios of 2.0 or more and, of these, 223 which yielded critica 
ratios of 3.0 or more. Darley and McNamara reviewed ce ues 
carefully, eliminated unnecessary duplications, made ; editoria 
changes in some of them, and, finally, added 21 new items. This gave 
them a total of 290 items for further tryout. 


Taste 76. Intercorrelations among the Factor Scores Underlying the Minnesota 


Personality Scale* 


100 men 100 women 

Factors intercorrelated = ; a 
Test‘ Retest | Test Retest 

| = 
Morale vs. social adjustment 29 ao | 20 32 
Morale vs. family relations 39 37 45 „39 
Morale vs. emotionality eA 19 10 18 18 
Morale vs, economic conservatism a7 28 39 20 
Social adjustment vs. family relations. 32 28 29 .32 
Social adjustment ys. emotionality... 33 Al i. «at 
Social adjustment vs. economic conservatism, . | 23 | P Fei 09 
Family relations vs, emotionality à 40 38 Ay 50 
Family relations vs, economic conservatism, . . “| —.05 | .04 16 -18 
Emotionality vs. economic conservatism enamine. peal 13 — 01 05 —.14 
* From Darley, J. G., and MeN 


amara, W. J. Manual of 


Scale. New York: The P. al Corporation, 1941. 


Directions, Minnesota Personality 
sychologic: 

Scale was that of 
into a new test form. This r 
100 women at Rochester 
College, and the University 
items surviving the first item analysis 
nalysis. Darley and M sae 

ae y and McNa xplicitly, but we 


alysis was the same 
mean that the 25 hj 
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and 25 lowest scoring women served as the criterion groups. When 
the results from this second item analysis became available, Darley 
and McNamara proceeded to eliminate from any further considera- 
tion those items falling into either one of the following two categories: 


1. Items with critical ratios below 3.00 in all analyses 


2. Items with critical ratios above 3.00 in the second analysis but with critical 


ratios below 3.00 in the original analyses (both test and retest) 


The ninth step in the development of the Minnesota Personality 
Scale was that of rescoring 200 test papers on the 218 items which 
met all criteria for retention in the scale. And, coupled with this, it 
included the giving of the final form of the test to 577 men and 557 
Women upon their entrance to the College of Science, Literature, and 


the Arts of the University of Minnesota. When all these papers were 

scored, the intercorrelations presented in Table 77 were obtained. 

Fic. 77. Intercorrelations among the Scores on the Second Edition of the Minnesota 
Personality Scale* 


1€0 100 577 |} 6557 


Fac intercorrelated 
Factors intercorrelate men |women) men | women 
Morale vs. social adj ; 3 3 41 36 
i a | 
amle vs. family relations H 50 | 26 34 
orale vs. emotionality... -- 41 | 53 -38 38 
Morale ys | .28 | 2 |. .18 
Sorini adjustment vs. family relations. 37 | -2 «25 -26 
a adjustment vs. emotionality “96 | 47 53 48 
Ta adjustment vs, economic conserva .05 .12 17 -13 
Sa relations vs. emotionality. -+ +++ ` von 42 52 54 
amily relations economic conservatism {J B 18 24 -16 
al) 0 | oD | BE E 


Emoti : A . 

Motionality vs. economic conservatism. 
a, W. J: Manual of Directions, Minnesota Personality 
ation, 1941. 


3 * From Darley, J. G., and McNamara 
Scale. New York: The Psychological Corpor: 


We see that these correlations are very similar to those in Table 76. 

The tenth step in the development of the Minnesota Personality 
Scale was that of determining score reliabilities. The stepped-up 
Spearman-Brown coefficients computed by Darley and McNamara 
are given in Table 78. We see that they range from a low of .84 toa 
high of .97 and that they average 93. Since each coefficient is based 
Upon a relatively small numb we can consider these 
reliabilities as eminently satisfactory- 


er of items, 
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Tapie 78. Reliability Data for the Minnesota Personality Scale” 


Number of items | Corrected coefficients 
Scale =. 7 7 
Men | Women | Men | Women 
PrE Eoss E e iii 40 44 -84 Ol 
Part II. Social adjustment. 6l s | 97 | 5 
Part III, Family relations. . 30 36 | 95 | 95 
Part IV. Emotionality..... 35 H .94 93 
Part V. Economic conservatism a8 32 | £92; 92 


* From Darley, J. G., and McNamara, W. J. Manual of Directions. Minnesota Personality 
Scale. New York: The Psychological Corporation, 1941. 


The eleventh and last step in the development of the Minnesota 
Personality Scale was that of preparing norms for the interpretation 
of scores. Darley and McNamara prepared separate norms for men 

. (N = 1083) and women (N = 888) and for hand-scored and ma- 
chine-scored editions of the test. These norms are given in their 
Manual of Directions. 

Darley and McNamara conclude their discussion of the steps in 

the development of the Minnesota Personality Scale by saying their 


+. procedures . . . have resulted in: a smaller number of tests for the counselor 


to interpret in diagnosing five important aspects of personality; a smaller and more 
homogeneous number of items in each of these tests than in t i 
from which the items were derived; a 
was characteristic of the origin 


he groupings of tests 


nd a higher set of reliability coefficients than 
al scales. 


THE GUILFORD INVENTORIES 


Inventory I,” contains 150 


urement of objectivity, © 
And the third inventory, iat 
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Inventory of Factors GAMIN,” contains 186 questions and pro- 
vides for the measurement of general activity, ascendance-submis- 
sion, masculinity-femininity, inferiority feelings, and nervousness. 
Guilford’s definitions of these 13 traits are as follows: 


Social Introversion-Extroversion. A high score indicates sociability, a tendency 


to seek social contacts and to enjoy the company of others. A low score indicates 


shyness, a tendency to withdraw from social situations and to be seclusive. A high 


score is more desirable for mental health than is a low score. A very low score indi- 


cates a need for guidance toward increased social participation. 

Thinking Introversion-Extroversion. À high score indicates a lack of introspective- 
ness and an extrovertive orientation of the thinking processes. A low score indicates 
an inclination to meditative thinking, philosophizing, analyzing one’s self and 
others, and an introspective disposition. The middle range of scores is more desir- 
able for mental health than either extreme. Each extreme, however, may have its 
value for certain types of occupation. 

Depression. A high score indicates freedom from depression, a cheerful, optimistic 
disposition. A low score indicates a chronically depressed mood including feelings 
of unworthiness and guilt. The higher the score the better is likely to be the emo- 


tional adjustment of the individual. 
Cycloid Disposition. A high score in 
and freedom from cycloid tendencies. 


dicates stable emotional reactions and moods, 
_ A low score means the presence of cycloid 


tendencies as shown in strong emotional reactions, fluctuations in mood, and a 
disposition toward flightiness and instability. The higher the score the better will 
be the emotional adjustment of the individual, except that scores too high may 


Indicate a colorless, inert individual. ee a: 
Rhathymia. A high score indicates a happy-go-lucky or carefree disposition, live- 
ndicates an inhibited disposition and an 


liness and impulsiveness. A low score 1 $ 
Overcontrol of the impulses. Both extremes of scores may represent psychological 


maladjustments and a score in the middle range is desirable for mental health. 
Objectivity. A high score on this trait indicates a tendency to view one’s self and 

surroundings objectively and dispassionately. A low score indicates a tendency 

to take everything personally and subjectively and to be hypersensitive. The higher 


the score the better for mental health. b- : : 
willingness to accept things and people 


Cooperativeness. A high score indicates a acce De 
as they are and a generally tolerant attitude. A low score indicates an overcritical- 


ness of people and things and an intolerant attitude. The higher the score the better 
for mental health unless the score on General Activity or clinical signs indicate a 
torpid and sluggish condition to be the basis of the lack of criticalness. Overcritical- 
ness is often a compensation for hidden feelings of inadequacy. Pathological cases 


may exhibit a paranoid projection of their conflicts and impulses. 
Agreeableness. A high score indicates an agreeable lack of quarrelsomeness and a 
lack of domineering qualities. A ore indicates a belligerent, domineering 


; - /. ow scores indicate an 
attitude and an overreadiness to fight low § 
extreme craving for superiority as an en 


low sc 
over trifles. Very 
d in itself developed as a compensation 
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for some chronic frustration and in pathological cases may lead to paranoid delusions 
of grandeur. It is possible that a sadistic component may occur in some of the 
pathological cases. i o 

General Activity. A high score indicates a tendency to engage in vigorous overt 
action. A low score indicates a tendency to inertness and a disinclination for motor 
activity. An extremely high score may represent a manic tendency while an ex- 
tremely low score may be an indication of a hypothyroid condition or other causes 
of inactivity. Thus, for good mental health a score in the middle range is usually 
most desirable. 

Ascendance-Submission. A high score indicates social leadership and a low score 
social passiveness. The score of a person on this trait must be interpreted in the 
light of his other characteristics of temperament, and no general rule can be set 
forth as to what scores are most desirable for mental health. 

Masculinity-Femininity. A high score on this trait indicates masculinity of emo- 
tional and temperamental make-up and a low score indicates femininity. The scores 
of the majority of males are above 5 and the majority of females have scores below 5. 
Males whose scores are very low are sometimes found either to lack their full quota 
of male hormones or to have an oversupply of female hormones. 

Inferiority Feelings. A high score indicates self-confidence and a lack of inferiority 
feelings. A low score indicates a lack of confidence, underevaluation of one’s self, 
and feelings of inadequacy and inferiority. The higher the score the better for mental 
health, with the exception of extremely high cases in which clinical investigation 
may reveal a superiority compensation for hidden inferiority feelings. Many psycho- 
neurotics have very low scores. 


Nervousness. A high score indicates a tendency to be calm, unruffled, and relaxed; 
a low score indicates jumpiness, jitteriness, and a tendency to be easily distracted, 
irritated, and annoyed. The higher the score the better for mental health unless 
there are clinical indications that an overly slu 
basis for an extremely high score. Extremely 
a lack of calcium in the blood. In many casi 
for the emotional tension expressed in jitteri 


ggish and torpid condition is the 
low scores in some cases may involve 


es, a mental conflict may be the basis 
ness and irritability. 


Development of the Inventories. We shall see in the development 
of the Guilford inventories an extensive use of the technique of 
factor analysis. We have already encountered this technique in our 
discussion of the Bernreuter Personality Inventory and the Minne- 
sota Personality Scale. But in these two instances the factor analyses 
were based upon intercorrelations among test scores while a be 

å 3 
— instance they will be based upon intercorrelations among 

The Nebraska Personality Inventory. Work was be what 
eventually became the Guilford inventories sometime ae 1936. 
Professor J. P. Guilford and his wife, Ruth B. Guilf a z da 
number of sources pertinent to the measurement Bie ee 
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extroversion. They reviewed Jung’s writings, Freyd’s monograph 
“Tntroversion-Extroversion,” and the Laird, Marston, Neyman- 
Kohlstedt, and Gilliland and Morgan introversion-extroversion 
inventories. In these sources they found 75 “‘unrepeated” items. 
From these unrepeated items they selected 35 as representing the 
“essence” of introversion-extroversion. These are given in Table 79. 
These 35 items were those which showed the least duplication of 
content from one item to another and were those upon which the 
consulted authorities showed most agreement. All items selected 
were mentioned by three or more of these authors. 

The Guilfords attempted to word the items so that an answer of 
yes” would indicate introversion as often as it would indicate 
extroversion and so that both the “yes” and “no” answers would 
appear equally desirable. Furthermore, they included the phrases 


t 


” 


“Do you like...” and “Are you inclined to...” on the 
onest in indicating their likes 


theory that subjects would be more h e 
than they would be in 


and dislikes, inclinations and disinclinations, 


reporting actual behavior. ; 
The Guilfords gave this list of questions to 930 subjects (430 men 


and 500 women) and one month later asked 277 of these subjects 
(163 men and 114 women) to take the test again. The directions 


accompanying the list were as follows: 


Below you will find 36 questions which are to be answered either “Yes” or “No.” 


Read each question in turn. Think what your behavior has usually been and under- 
line either “Yes” or “No” whichever answer describes your behavior better. If 


you cannot decide, then guess. Be sure to answer every question. There is no implica- 
: > g h i 
tion of right or wrong in any of these items. 

The Guilfords’ next step was to compute a series of contingency 
coefficients showing how each item was related to each of the other 
items in the list. This meant the computation of 630 coefficients 
These coefficients were found to range, with two exceptions, be- 
tween —.50 and .50. The Guilfords subjected these coefficients to a 
Spearman-type factor analysis to determine oe pe 

corr a large 
factor could account for all the correlations. hey foun d oe 
general factor but a number of specific factors also. They conclude 
that the 36 items could not be said to lie along just one linear dimen- 
sion. Actually they extracted 18 factors from their interitem correla- 


tional matrix. 


Later the Guilfords rep! 


aced their contingency coefficients by 
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“pana 
Taste 79. Items Which the Guilfords Selected to Represent the “ Essence” of 
Introversion-Extroversion* 


1. Do you express yourself better in speech than in writing? 
2. Are you inclined to limit your acquaintances to a select few? 
3. Do you generally prefer to take the lead in group activities ? 
4. Do you prefer to read about a thing rather than experience it? 
5. Do you like work which requires considerable attention to details? 4 
Are you generally very particular about your personal property, i.e., do you take very goo 
care of your things? 
7. Are you inclined to be considerate of other people’s feelings? 
Are you inclined to act on the spur of the moment without thinking things over? 
. Have you ever kept a personal diary of your own accord? 
10. Do you work much better when you are praised? 
- Do you like to change from one type of work to another frequently? 
12. Are you inclined to study the motives of others? 
13. Do you daydream frequently ? 
- Do you prefer to work with others rather than alone? 
. Are you inclined to worry over possible misfortunes? 
16. Are you frequently somewhat absent-minded? 
. Do you like to persuade others to your point of view? 
. Are you inclined to keep in the background on social occasions? 
. Are you more interested in athletics than intellectual things? 
. Do you usually dislike to change opinions you have already formed? 
+ Do you like to speak in public? 


Do you prefer to work things out for yourself rather than accept suggestions from others? 
23. Do you have frequent ups and downs in mood, cither with or without apparent cause? 
- Are you inclined to be slow and deliberate in movement? 
25. Are your feelings rather easily hurt? 
. Do you enjoy getting acquainted with most people? 
- Are you inclined to keep quiet when out in comp 


. Do you adapt yourself easily to new conditions, ie., to new environments, situations, 
places, etc? 


any? 


29. Do you like to confide in others? 
. Do you express such emotions as 
31. Are you inclined to think abou 
32. Do you like to have people wa: 

- Do you frequently rewrite soci 
34. Do you like to sell things? 


35. Do you get rattled easily in exciting situations? 
36. Are you a male? 


* From Guilford, J. P., and Guilford, R. B. 


3 1 An analy: 
Introversion-extroversion. F. abn. soc. Psychol., 1934 28, 
> 


computing tetrachoric intercorrelation 
age eepaly a than the contingency coefficients and ranged in 
yane irom —.60' ito J69. These were j t 

k i subjected to a Thurstone 
centroid factor analysis, and five Psychologically meaningful factors 
were extracted. = 


delight, sorrow, anger, etc., readily? 
t yourself much of the time? à 
tch you when you are working? 

al letters before mailing them? 


sis of the factors in a typical test of 
» 377-399, 


s. These were found to aver- 


fords was that of building more adequate s 
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The first of these factors appeared to be social introversion- 
extroversion. “At one end of the scale the individual seeks to with- 
draw, to remove himself from social contacts and social responsi- 
bilities; at the other end of the scale he seeks social contacts and 
depends upon them for his satisfactions.” 

The second factor appeared to be an emotionality factor. “Run- 
ning throughout the list of characteristics is a thread of emotional 
immaturity or emotional dependency. The individual having those 
traits would seem to lack self-sufficiency.” 

The third factor appeared to represent a masculine ideal, that is, 
maleness, agressiveness, dominance, and so forth. i 

The fourth factor appeared, say the Guilfords, to be a “happy-go- 
lucky” factor. One end of the scale represented the slow, methodical, 
deliberate person, and the other end of the scale represented the 
careless person, the person careless about dress, personal property, 
and the feelings of others. It is best described, say the Guilfords, by 
the Greek word rhathymia. 

And finally, the fifth factor in 
tackling problems requiring th 
prompt overt action. It can be c 
introversion-extroversion factor. 

Following this factor analysis, 


dicated a liking for thinking and for 
ought as opposed to a liking for 
alled, say the Guilfords, a thinking 


the next step essayed by the Guil- 
cales for the measurement 


of the factors social introversion-extroversion, emotionality, and 
masculine ideal. They added 87 new items to their original list of 
36, bringing it up to a total of 123. The Guilfords gave this aug- 
mented list of items to 815 new subjects (382 men and 433 women). 
This time, answers of “Yes,” “No,” and “2”? were permitted. The 
plan was, say the Guilfords, “ to separate these subjects into highest 
and lowest quartiles for factors, S, E, and M separately, and from 
the three pairs of criterion groups to validate every test item and to 
derive a scoring weight for it in one or more of the factors.” | 

y All items in the 36-item test were assigned scoring weights in 
‘rough agreement with their factor loadings. Using these weights, 
factor scores were secured, and for each factor the highest scoring 
25 per cent and the lowest scoring 25 per cent of the subjects were 
selected as criterion groups- The number of subjects in each criterion 
group answering “Yes,” ENO” and ? was determined. Then 
Weights were assigned in accord with Strong's item-weighting pro- 
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s Don St 
cedure. To make this clear, consider the data for the question “Do 


you express yourself more easily in speech than in writing? Fhe 
à Sat ; 7. 9D CENT FO “> 
number in each criterion group answering “Yes,” “No,” and"? 


and the calculations leading to the assignment of scoring weights 
are set forth in Table 80. 


Taste 80. Determination of Scoring Weights* 


Group Yes : No 
= nr | | 

High criterion group....... 7 6l 6 133 

Low criterion group. . 142 8 50 

—81 —2 83 

=? 0 | 2 

4 4 4 

2 4 6 


* From Guilford, J. P., and Guilford, R. B. Personality factors S, F, and M, and their meas- 
urement. 7. Psychol., 1936, 2, 109-127. 


The Guilfords found 101 items with significant scoring weights for 
one or more of the factors. These items were assembled into a new 
form and became the Nebraska Personality Inventory. The relia- 
bilities of these scores for one group of 100 cases and for one group 
of 665 cases are given in Table 81. Via the Spearman-Brown formula 


Taste 81. Reliability Data for the Nebraska Personality Inventory* 


E Seale | WN = 100) | (N = 665) 


Social introversion-extroversion | 


Emotionality : 89 | a 
Masculine ideal.......... 65 | a 
-65 
* From Guilford J. P., and Guilford, R. B ersonality factors S, and t nei 
> -a - B. Pers f ors S, E, and 
measurement. 7, Psychol, 1936, 2, 109-127. i ae oe an 


these values were stepped 
t i -up from the cor i ; he 
items in the first half and the į i Bi Aa ingle 
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Taner 82. Intercorrelations among the Scoring Weights and among the Scores of the 
Nebraska Personality I aventory* 


| j] 


] 
Seale Weights Short Long 
oe | form form 
ial introversion-extroversion vs. emotionality . —.02 18 | .46 
sion-extroversion vs. masculine ideal. . .02 09 40 
—.21 —.24 —.01 


Emotionality vs. masculine ideal... 


* From Guilford, J. P., and Guilford, R. B. Personality factors S, E, and M, and the 


measurement. J. Psychol., 1936, 2, 109-127. 


An Inventory of Factors STDCR. The results presented in Table 
those for the long form, were not exactly what the 
Guilfords had intended. Therefore, they set about getting new 
weights and new criterion group separations. They did this by using 
those with large and significant loadings on one 
all others. Double weight was given 
more. These items are 


82, particularly 


es yor . 
pure” items, /.¢., 
factor and with zero loadings on 


to any item with a factor loading of .50 or 1 
given in Table 83. 


ghly Saturated with Factors 8, E, and M* 


Tanie 83. Zems Hi 


Social introversion-extrove rsion 
than in writing? 

r than experience it? 
d on social occasions? 


Do you express yourself better in speech 

Do you prefer to read about a thing rathe 

Are you inclined to keep in the backgroun 

Are you inclined to be slow and deliberate in movement? 

Do you enjoy getting acquainted with most people? 

Are you inclined to keep quiet when out in company? 
Emotionality 


Do you daydream frequently? 

Are you inclined to worry over possi 
Do you have frequent ups and down: 
Are your feelings rather easily hurt? 
Are you inclined to think about your 


ble misfortunes? 
s in mood, either with or without cause? 


self much of the time? 


Masculine ideal 

eye 3 e . sale 
Have you ever kept a personal diary of your own accord? 
absent-minded? 


conditions, places, 


Are y 

i you frequently somewhat 
o you adapt yourself easily to new 
ete? 

Do you li r 

An you like to sell things? 

Are you a male? 


i.e., to new environments, situations, 


M, and their 


* From Guilford, J. P., and Guilford, R. B. Personality factors S, E, and 
. P., 3 


measurement. Y, Psychol., 1936, 2, 109-127. 
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At this point the Guilfords set up a new questionnaire to cover 
more adequately the “happy-go-lucky” and thinking dimensions. 
They added to this list their questions on social introversion-extro- 
version and gave the combined list of 89 questions to 1,000 subjects 
(610 men and 390 women). Then they selected 30 of these items, the 


Taste 84. Items Giving Rise to Factors 8, T, D, G and RE 


. Are you ordinarily a carefree individual? 
+ Do you usually have difficulty in starting a conversation with strangers? 
- Do you prefer to read about a thing rather than to experience it? 
Do you hesitate to lend your personal property even to close friends? 
. Are you inclined to be considerate of other people’s feelings? 
. Are you relatively unconcerned about what others think of your actions? 
+ Are you inclined to analyze the motives of others? 
Do you consider yourself a practical individual rather than one who theorizes? 
. Do you usually keep in close touch with things going on around you? 
. Are you inclined to worry over possible misfortunes? 
+ Do you often have the “blues”? 
. Are you inclined to keep in the background on social occasions? 
- Are you more interested in athletics than in intellectual things? 

Would you rate yourself as an impulsive individual? 
Do you enjoy getting acquainted with most people? 
Do you frequently find yourself in a meditative state? 
17. Are you inclined to be over-conscientious? 

18. Do you often crave excitement? 
` 19. Are you inclined to ponder over your past? 

Are you inclined to stop and think things over before 
21. Are you less attentive than most indivi 
22. Do you like to discuss the more serious questions of life with your friends? 
23. Do you like to try your wits in solving puzzles? 
+ Would you rate yourself as a happy-go-lucky individual? 
25. Do you enjoy thinking out complicated problems? 
26. Are you inclined to be introspective, that is, to analyze yourself? 
27. Are you usually concerned about the future? 
28. Do you usually become so absorbed in watching an athletic contest that you completely 
forget yourself? 
29. Can you relax yourself easily 
30. Are you more alert to your i 


= 
FSP ena RON 


acting? 
duals to things going on around you? 


when sitting or lying down? 
mmediate surroundings than the average person? 


* From Guilford, J. P., and Guilfor 


d, R. B. Personality 
Psychol., 1939, 34, 21-36. 


factors D, R, T, and A, F. abn. soc. 

30 items given in Table 84, computed their tetrachoric intercorrela- 

tions, subjected these intercorrelations to a Thurstone centroid 

factor analysis, and extracted nine factors. 
The social introv 


ersion-extroversion fac 
rhathymia factor w 


- tor was rediscovered, the 
as verified, the thinkin 


g dimension was split into 
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two subfactors, and two new factors, depression and alertness, were 
defined. 

We have now described the sources from which Guilford chose 
many of his items for “An Inventory of Factors STDCR.” Guilford 
has not explicitly described the development of the final form of this 
inventory, but we can surmise that it was done much after the 
fashion involved in setting up his initial scales for Factors S, E, and 
M. Guilford states that “the 175 items of the inventory were re- 
tained after successive tests of internal consistency of the responses 
of 400 university students... . ” 

An Inventory of Factors GAMIN. The Guilfords then constructed 
a questionnaire designed to test G. L. Freeman’s hypothesis of an 
activity drive. They prepared, with Freeman’s help, a list of 100 
questions and gave this list to 600 University of Nebraska and 
Northwestern University students. They selected 24 items for 
intensive study, computed their tetrachoric intercorrelations, sub- 
jected the resulting matrix to a ‘Thurstone centroid factor analysis, 
and extracted seven factors. Two of these factors were fairly clear- 
cut, and indicated dimensions of nervousness and general drive. 

Guilford and Howard G. Martin then selected a list of over 300 


items covering the following factors: 


G. General pressure for overt activity 
A. Ascendancy in social situations a: 
qualities 
M. Masculinity of attit 
I. Lack of inferior feelings; self-confidence 
N. Lack of nervous tenseness and irritability 


nd N were suggested by the analysis just 
or I were suggested by a study conducted 
for factors M and A were suggested by 
some of the material in the Guilfords’ earlier analyses. 

This new list of items was given to 500 students (250 men and 250 
Women) attending colleges and universities 1n Southern California. 
Preliminary scoring keys, based upon previous factor analyses, were 
applied to 400 papers, the 100 highest scoring cases and the 100 
lowest scoring cases were selected as criterion groups (on each 
factor), and item analyses were conducted. 

“Scoring weights were found for each response to every item by 
using Guilford’s abac method. This procedure yielded final scoring 


s opposed to submissiveness; leadership 


udes and interests as opposed to femininity 


_The items for factors Ga 
discussed, the items for fact 
by Mosier, and the items 
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keys consisting of 41 items for trait G, 50 items for trait A, 52 items 
for trait M, 69 items for trait I, and 69 items for trait N. Only 9 
items were scored for more than one trait.” 

The Guilford-Martin Personnel Inventory. This inventory “was 
designed with two primary purposes in mind. It was first of all 
designed as a means of assisting supervisors of workers in business 
and industry to single out and diagnose those individuals who are 
personally maladjusted. . . . As a second motive, the test was 
designed to extend the list of temperamental traits already assessed 
by Guilford’s Inventory of Factors STDCR.” 

Guilford and Martin’s intent in the construction of their Per- 
sonnel Inventory was to cover the “paranoid disposition.” They 
selected 200 items which they felt might be diagnostic of “ subjec- 
tivity (taking things personally; ideas of reference; touchiness); bel- 
ligerence (domineering attitude; craving for superiority); suspicious- 
ness; and faultfinding or hypercriticalness.” Guilford and Martin 
gave their list of questions to 500 industrial and business employees 
(250 men and 250 women), scored 400 of the papers on the basis of a 
priori scoring keys, applied their test of internal consistency (using 
top and bottom fourths as criterion groups), and selected 150 items 
to comprise the final form of the inventory. As finally developed, 
however, it gives only three rather than four scores. These are 

cbjectivity (as opposed to personal reference or a tendency to take 
things personally); agreeableness (as opposed to belligerence or a 
dominating disposition and an overreadiness to fight over trifles); 
and cooperativeness (as opposed to fault finding or overcriticalness 
of people and things).” 
Objectivity is measured b 

o people near 
another when the 


“ 


Finally 
: 4 a b 
the responses to questions such as, “ Do y asured by 


ou believe that most people 
peop 
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shirk their duties whenever they can without appearing to do so?” 
“Do you believe that only people with money can be sure of getting 
a square deal in courts of law?” and “Do you feel that many young 
people get ahead today because they have ‘pull’?” 
_ Scoring weights were determined for these and for the remaining 
items in the inventory in accord with Guilford’s procedure for 
determining phi coefficients (see pages 245 to 248 of his text Funda- 
mental Statistics in Psychology and Education). The weights which 
result from this procedure are directly proportional to the phi 
coefficients and inversely proportional to the standard deviations 
of the distributions of item responses. 
Reliability. The reliabilities of the scores on the Guilford inven- 
tories range from .80 to .94. They are summarized in Table 85. Those 


Guilford and Guilford-Martin I nventories* 


Taste 85. Reliability Data for the 
Social introversion-extroversion. «+. +++ 90 Masculinity-femininity 85 
Thinking introversion-extroversion 84 Inferiority feelings. . . . 91 
Depression. ......... .94 Nervousness....-- -89 
Cycloid disp 88 s: 83 
Rhathymi 90 eableness......6--4++++ . .80 
General activi 89 Cooperativeness.......60seseeeee eee 91 
88 


Ascendance-submission....- +++ 


(Rev. Ed.) 41 Inventory of Factors STDCR; 
of Factors GAMIN; and 
Beverly Hills, 


of Directions. 
The Guilford-Martin Inventory 
The Guilford-Martin Personnel Inventory. 


* From Guilford, J. P. Manual 
Manual of Directions and Norms. 
Manual of Directions and Norms. 
Calif.: Sheridan Supply Company. 
vere determined from the correl 
items divided into two equal pools. Those for 
factors GAMIN, O, Ag, and Co were determined from the correla- 
tions between random ‘halves of the items. All reliabilities reported 
in Table 85 were determined on 100 cases not included in the stand- 


ardization groups. ; | 
Intercorrelations. Final score intercorrelations are presented in 


Tables 86, 87, and 88. Those among factors STDCR are given in 
Table 86, those among factors GAMIN are given in Table 87, and 
those among factors O, Ag, and Co are given in Table 88. es 
Intercorrelations are, in several instances, rather far removed from 
the ideal intercorrelation of -00. But they are sufficiently moderate 
to suggest that sensibly different variables are measured by the 


different scales. 
Validity. We come now to tl 


ations between 


for factors STDCR w 


alternate sixths of the 


he question of validity and find three 
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sets of data to report. The first consists of a comparison between the 
scores on factors STDCR and a series of self-rating 


rating by five associates. The scales on which tl 
made are given in Fig. 6. Table 89 shows the rel 
ratings and their correlations with the factor sc 


s and a composite 
hese ratings were 
iabilities of these 
ores. Correlations 


Taste 86. Intercorrelations among Factors $, T, D, C, and R* 


Factor Tr D Cc | R 
= P 

S | AS 49 Be 54 

T | Pa i) 14 | 21 

D 85 -36 

C | .02 


of Directions. (Rev. Ed. 
ply Company. 


* From Guilford, J. P. Manual 


) An Inventory of Factors STDCR. 
Beverly Hills, Calif.: Sheridan Sup 


Taste 87, Intercorrelations among Factors G, A, M, I, and N* 
Factor} a |m | y N 
G S1 -16 -39 27 
A 34 54 16 
M 43 34 
I 70 


* From Guilford, J. P, Manual of Directions and No; 


rms. The Guilford-Martin Inventory of 
Factors GAMIN, Beverly Hills, Calif.: Sheridan Supp! ER 


ly Company. 
TABLE 88. Z; ntercorrelations among Factors O, Ag, and Co* 


Factor 


Ag | Co 
n 

o -64 55 

Ag sn -63 


* From Guilford, J. P, Manual of Directions and 


Inventory, Bereily Hills, Calif, Shes Norms. The Guilford-Martin Personnel 
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This was given by Guilford and Martin to 51 aircraft employees. 
Upon the basis of the distributions of scores obtained (on objectivity, 
agreeableness, and cooperativeness), Guilford and Martin predicted 
that anyone “below the median on two or more traits” would be 
found unsatisfactory, i.e., would be found to be a troublemaker, ete. 


S—Shuns society of others; 1 
is shy 

T—Frequently indulges in 1 
meditative thinking 

D—Emotions and moods pre- 1 
dominantly unpleasant; 
worried; depressed 

C—Has frequent or radical 1 
changes in mood 


2345 


23456789 


345 


6789 
56789 


6789 


R—Happy-go-lucky; carefree 123456789 
Fie. 6, Rating scales used in validating factors STDCR. (From Guilford, F. P. 


Personality factors D, R, T and A. Y. abnorm. soc. Psychol., 1939, 4, 21-36.) 
Taste 89. Reliability and Validity Data for the Rating Scales in Fig. 6* 


Seeks society of others; is a 
“good mixer.” 

Shuns meditative or reflective 
thinking 

Emotions and moods pre- 
dominantly pleasant; cheer- 
ful; optimistic 

Very uniform in mood 


Serious minded; conscientious 


| : 
Year |Number} S fT D c R Rating by 
Validity 

1938 | 50 .68 .46 35i, 4 -65 | Self 
1939 51 60 wg 45 ee. :39 | Self 
1938 | 50 -70 .08 .60 -20 -48 | Others 
1939 | 51 43 ae -46 | Others 

[i 

Reliability 

1 55 Pe 51 58 -68 | Self and others 
D FH 63 Esi 54 aa .63 | Self and others 

| 32 | e | ol | -73 | -38 | -54 | Others 


* From Guilford, J. P., and Guilford, R. B. 


See. Psychol., 1939, 34, 21-36. 


Personality factors D, R, T and A. J. abnorm. 


Twenty-two of the workers in the group of 51 actually were con- 
Sidered troublemakers by management, and the remaining 29 were 
Considered satisfactory. Without knowing management's classifica- 
tion, Guilford and Martin correctly classified 73 per cent of the 
unsatisfactory workers and 66 per cent of the satisfactory workers. 
This result argues for some basic validity, Guilford and Martin 


elieve, for the scores on their Personnel Inventory. 
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The third set of validity data relate to factor M. To validate the 
scores on this factor, Guilford and Martin compared the scores of 
men and women not included in their standardization or item analy- 
sis groups. They found that 92 per cent of 50 men secured scores 
“above the median of the distribution . . . of the two sexes com- 
bined, and that 92 per cent of 50 women secured scores “below the 
median of this distribution.” A phi coefficient was computed and 
found to be .84. This value Guilford and Martin consider “highly 
satisfactory.” 

Norms. All norms for the Guilford inventories are given in terms 
of what Guilford calls a C scale, a normalized distribution with 11 
intervals. Guilford assumes a normal distribution underlying each 
of his factor distributions and a predetermined percentage of cases 


in each score interval. These norms will be found in his Manuals of 
Directions. 


THE ALLPORT-VERNON STUDY OF VALUES 


Our fifth example of a multidimensional approach to the measure- 
ment of Personality is contained in the Allport-Vernon Study of 
Values. This test was designed in 1931 by P. E. Vernon and 
Gordon W. Allport (and revised in 1951 by Allport, Vernon, and 
Gardner Lindzey) to measure six pervasive or broad evaluative 


attitudes first defined by the philosopher Edouard Spranger. Vernon 
and Allport feel that P P Ouard Spranger. Vern 


g the isolation and measurement of single habits, traits or capacities within 
personality give an incomplete and frequently misleading picture. It is evident [they 
say] that in some fashion . . | the significance of these single factors is dependent 
upon the total personality in which they are set, [Therefore] an nvestwntor must 


find within a personality broad functions that are common to all other personalities. 


These functions are to be found, argue Vernor 
Spranger’s list of evaluative attitudes, 
redefined by Allport and Vernon, a 


n and Allport, in 


The Spranger attitudes, as 
re as follows: 


“cognitive attitude, one 
elf of judgments regard- 
nl rve and to reason. Since 
re empirical, critical, and rational, he is 


ing the beauty or utility of objects, 
the interests of the theoretical man a 
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necessarily an intellectualist, frequently a scientist or philosopher. His chief aim in 
life is to order and to systematize his knowledge. 

Economic. The economic man is characteristically interested in what is useful. 
Based originally upon the satisfaction of bodily needs (self-preservation), the inter- 
est in utilities develops to embrace the practical affairs of the business world, the 
production, marketing and consumption of goods, the elaboration of credit, and 
the accumulation of tangible wealth. This type is thoroughly practical and con- 
forms well to the prevailing conception of the average American businessman. 

Aesthetic. The aesthetic man sees his highest value in form and harmony. Each 
Single experience is judged from the standpoint of grace, symmetry or fitness. He 
regards life as a manifold of events; each single impression is enjoyed for its own 
sake. He need not be a creative artist; nor need he be effete; he is aesthetic if he 
but finds his chief interest in the artistic episodes of life. 

Social. The highest value for this type is love of people, whether of one or many, 
whether conjugal, filial, friendly, or philanthropic. The social man prizes other 
persons as ends, and is therefore himself kind, sympathetic, and unselfish. He is 
likely to find the theoretical, economic, and aesthetic attitudes cold and inhuman. 
In contrast to the political type, the social man regards love as itself the only 
Suitable form of power, or else repudiates the entire conception of power as endan- 
gering the integrity of personality. In its purest form the social interest is selfless 


and tends to approach very closely to the religious attitude. i o 
Political. The political man is interested primarily in power. His activities are 
not necessarily within the narrow field of politics; but whatever his vocation he 


betrays himself as a Macht mensch. Leaders in any field generally have a high 


Power value. Since competition and struggle play a large part in all life, many 
Philosophers have seen power as the most universal and most fundamental of 


motives, P ; i 
Religious. The highest value for the religious man may be called unity. He is 


Mystical, and seeks to comprehend the cosmos as a whole, to relate himself to its 
i efines the religious man as one “whose mental 


embracing totality. Spranger d 
s i a. d to the creation of the highest and absolutely 


Structure is permanently directe 
isfy} . » 
Satisfying value experience. 


Edouard Spranger, a philosopher of the Verstehende school, postu- 
lated in his book Types of Men that men are guided primarily by 
one or the other of these six predominant all-pervasive attitudes. 


Thus one man will behave or act, says Spranger, primarily from a 
another primarily from a religious point 


from an aesthetic point of view, and so 
on. It is not hard for us to find examples illustrative of Spranger’s 
theory, We can cite the poet or musician who looks at aine from 
an aesthetic point of view, the politician who looks at tng mn å 
Political point of view, the minister who looks at a wjt 
religious point of view, and so on. Spranger so carefully defined his 


theoretical point of view, 
of view, another primarily 
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types that we can see them almost on every hand. Spranger zesten 
his case at the arm-chair level, however, and did not attempt = 
empirical verification of his theory. This latter task was ir e 
by P. E. Vernon and Gordon W. Allport in their development of the 
Study of Values. i 

Allport and Vernon constructed the Study of Values to show t he 
relative predominance of Spranger’s six evaluative attitudes. The 
scoring is such that we can discover only whether a man is governed 
more or less by a theoretical attitude than by, let us say, a religious 
attitude, and so forth. This is in contrast with most scales of per- 
sonality measurement in which the score on a scale is obtained inde- 
pendently of, and without reference to, 
the test may be scored. 


The Study of Values consists of two parts. There are 30 items in 


Part I and 15 items in Part II. The first item in P 
way: 


the other ways in which 


art I is worded this 


The main object of scientific research should be the discovery of pure truth rather 
than its practical applications. 


The subject must answer “Yes” 
he agrees or disagrees witl 
answers “Yes,” the infere 
retical attitude, but if he 
guided more by an econo 
Part I offer similar com 
sented in 10 of these 


or “No,” thus indicating whether 
h the statement. In this instance if he 
nce is that he is guided more by a theo- 
answers “No,” the inference is that he is 
mic attitude. The remaining questions in 
parisons. Each evaluative attitude is repre- 
comparisons. The first item in Part II reads as 


Do you think that a 


good government shoul 
a) more aid for the 


d aim chiefly at: 
poor, sick and old 
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Development of Test. Allport and Vernon prepared the items for 
the Study of Values “upon the basis of their fidelity in representing 
Spranger’s six types.” The only evidence of this fidelity, however, is 
Allport and Vernon’s claim that such is the case. 

In any test requiring the choice between two or more alternative 
answers, care must be taken to see that choices are made upon the 
basis of the crucial issue. If they are made upon the basis of factors 
irrelevant to the main issue, this will defeat the purpose of the test. 
In the present instance Allport and Vernon wanted the popularity 
of the alternatives to be equated so that a subject would not choose 
an alternative because of its greater popularity. Allport and Vernon 
tried a number of alternatives for each item until they satisfied 
themselves that the alternatives left in the final form of the test 
differed little or not at all in terms of their relative popularity. 

t’s next step consisted of making an internal 
consistency analysis, using the top and bottom fourths on each 
variable as their criterion groups. They made this analysis on three 
different groups of subjects. There were more than 160 subjects in 
each group, so that there were at least 40 subjects included in each 
one of the criterion groups. Vernon and Allport then tabulated the 
“average marks which these extreme groups gave to each of the 
Proposed answers, that referred to the value in question. Then 
they computed the difference between the average scores, and 
‘divided this difference by its probable error. All items retained had 
differences greater than three times their own probable errors, and 
the average difference for the items retained was six times its prob- 
able error. In the 1951 revision, Vernon, Allport, and Lindzey report 
similar item analyses on 780 subjects and state that each item in the 
test correlates significantly with the total scale of which it is a part. 


In Part I a subject can secure 1, 2, or 3 points for each answer. 
Sine -o attitude is represented ten times, this means 

e each evaluative attite 
nge from 10 to 30. In Part Ila 


that the score fi ach scale can ra 
ore for each sca | 
Subject can give ranks of 1, 2, 3, or 4, These are summed, and since 
€ g V a 


each evaluative attitude is represented ten times, the scores on each 

scale can range from 10 to 40. In this instance, since these scores x 

ranks, they are subtracted from 40 so that a high numerical value 
> 


will represent the predominance of an attitude. In ee tii. 
the scores are subtracted from 36 and 42 (not from ). ‘he rine 
these is for the social scale, and the second is for the religious scale. 


Vernon and Allpor 
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These subtractions correct these scores for the greater popularity 
of the items indicating social attitudes and the lesser popularity 
of the items indicating religious attitudes in comparison with the 
items in the other four scales. The final scores can range from 0 to 
60, with 30 being the average or neutral score on each scale. The 
predominance of an evaluative attitude is indicated by a score of 40 
or more, and its lack of predominance by a score of 20 or less. 
Reliability. Allport and Vernon, Whitely, and Allport, Vernon, 
and Lindzey report the split-half and repeat reliabilities given in 
Table 90. On the old form, the social scale had the lowest reliability 


Taste 90. Reliability Data for the Allport-Vernon Study of Values* 


Revised edition | Original edition 


3 = | Whitely 
Seale Allport, Vernon, Ver and All ey 
at and Lindzey ernon and ¢ port 


Split-half Split-half| Repeat Repeat 


Theoretic... T? .73 .62 66 68 
Economi: 87 Fyd 71 79 
Aesthetic, . -80 84 84 86 
.82 -49 39 50 

77 53 | 55 | 176 

a 90 84 80 .87 


and the aesthetic scale t 
scale has the lowest relia 
general, the new form 
reliability than did the 

Validity. In determin 
of Values, we have two 


he highest. On the new form, the aesthetic 
bility and the religious scale the highest. In 


possesses a considerably greater degree O 
original form. : 
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between the scales on the original and revised editions of the test 
vary between .31 and .75. These values would seem to raise some 
question as to how validly the scales on either edition represent the 


original Spranger types. 


Evidence on the second question is more plentiful. Vernon and 
Allport, Stone, Cantril and Allport, and Allport, Vernon, and 
Lindzey have secured evidence of differences in mean scores of men 
and women and of students engaged in different studies. Table 91 


Taste 91. Mean Study of Values Scores for Men and Women* 


| Revised edition 


Original edition 


Seale Men | Women Men Women 
N=81|N=96 | N= 1,163 | N = 1,592 

EE 43.3 36.4 30.8 27.7 
Economic ae 42.1 38.8 32.0 27.0 
Aesthetic 37-2 42.2 27.0 33.0 
SOCIAL nsore Er i 41.2 29.7 31.6 
Political. . 42.7 38.1 32.1 27.9 
Religious 37.0 43.2 28.0 | 33.3 


* From Cantril, H., and Allport, G. 
abnorm. soc, Psychol., 1933, 28, 259-273; 
G. Study of Values. (Rev. Ed.) 
1951, 


TaBe 92. Mean Study of Values 


W. Recent applications of the Study of Values. 7. 
and from Allport, G. W., Vernon, P. E., and Lindzey, 
Manual of Directions. Boston: Houghton Mifflin Company, 


Scores for Students with Different College Majors* 


| 
Subject Number | Theoretic Economic Aesthetic | Social | Political | Religious 

— oe a a ca | A 
Banking.,..........|. 21 29 38 a |2| 3# 19 
Business............ 110 28 | | as | 30 | 33 | 24 
Commercial.......... 125 | 32 | 34 | 22 29 | 32 | 31 
Education... A 21 2 | 26 35 | 30 32 29 
Engineering. 32 35 | 26 29 31 26 

“Ngincering oe 64 34 5 | & 

“aw, 3 | 26 31 | 30 31 37 26 
Literature 24 24 277 | 40 29 | 30 30 
Medical... 45 36 27 3 | 30| 28 28 
Missionary... ; 7 23 2 | 35 | 2 49 

Issionary . . 80 2 

Psy nected 23 37 31 | 2 22 

'sychology.. . . 10 44 | ee 
Salesmen 110 2 | 3 24 27 36 26 
Science... | 48 35 26 si | Sl | 8 28 
ia | | me | | ea) ® 

* From Cantril, H., and Allport, G. W. Recent applications of the Study of Values. 7. 


abnorm. soc, Psychol., 1933, 28, 259-273. 


218 Personality Measurement 


shows mean scores for men and women, and Table 92 shows mean 


scores for students majoring in different subjects. Vernon and All- 
port conclude: 


The Study of Values test . . . affords a method of scaling the relative predom- 
inance of the theoretical, economic, aesthetic, social, political and religious values 
in personality. The results indicate that Spranger is on the whole justified in regard- 
ing these values as constituting generalized motives in men, and that the test 


succeeds in determining with some precision the prominence of each value in any 
single individual. 


Biographical Application. Personality tests are designed to be 


used by living individuals. The questions are usually too detailed and 


too specific for use in connection with historic personages. One 


attempt at such an application has been made, however, with the 
Study of Values. We give this study in some detail, for to date it 
represents, apart from the work of Cox in estimating the intelligence 
quotients of 300 eminent men from biographical data, a unique 
application of a personality test. This application consists of an 
attempt to fill out the Study of Values for Jonathan Swift, author 


of Gulliver’s Travels (to Lilliput, Brobdingnag, Laputa, and the 
country of the Houyhnhnmg, etc.) 


From various biographies, 
himself, data were secured w 
Swift the Study of Values. 


as well as from the works of Swift 
hich enabled the author to fill out for 


It was hoped that by the use of the 
ions furnished by the Study of Values an im- 


are given in Fig. 7. 


res are 42.0, 33.0, 31.5, 30.0, 
» Social, economic, aesthetics 
the political, social, 
an gnificantly from the 
1g1 i i 

markedly. Ferguson reports Sous and theoretic scales differ 


25.5, and 18.0 for th 


shows that Swift would have pre- 
about improvement 1^ 
hreats to constitutional 
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atoms; and he would have preferred reading accounts of the lives and works of 
such men as Aristotle, Plato and Socrates rather than those of Alexander, Julius 


Caesar and Charlemagne. 
If a series of popular lectures had been offered Swift would have preferred one 


on the progress and needs of social service work to one on contemporary painters; 


45 


Scale value 


== -L eet is J 


AEN i ai Pr 
i Theoretic Economic Aesthetic Sociol Political Religious 


Evaluative attitude 
Fic. 7. Profile of the evaluative attitudes of Jonathan Swift. (From Ferguson, 
L. W. The evaluative attitudes of Jonathan Swift. Psychol. Rec., 1939, 3, 26-44.) 


and under like circumstances would have preferred one on the ee — 
a mparative methods of government. 
ment of the great religious faiths to one on the co! { 
Swift rs have reira to write on the defects of the educational gr 
rather than on the role of church-going in religion; and he would ie Be erre 
Writing on the distribution of one’s income between the necessities of life, luxuries 


ity of friend. 

and savings rather than on the personality of a close frier f 
It was me justifiable, according to Swift, for great artists to be E of T 
feelings of others; and “unselfishness and sympathy he se ee more F F 
Character traits an high ideals and reverence. Because of the aggressive and self- 
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assertive nature of man he thought the abolition of war an illusory ideal. Hie in 
not think that one who analyzes his emotions was any less sincere than | om 
did not; and he did not believe that contemporary charitable policies should be 
po m the world would be a much better place if people took to heart 
the teaching “Lay not up for yourselves treasures upon earth . =. but lay a 
for yourselves treasures in heaven,” etc; he thought the aim of churches should be 
to convey spiritual worship and a sense of communion with the highest rather than 
to attempt to bring out altruistic and charitable tendencies; taking the Bible 
as a whole, he did not consider it as beautiful mythology but rather as spiritual 
revelation. 

If Swift had been a university professor he would rather have taught poetry 
than physics or chemistry; he would have preferred teaching economics to law. If 
nothing of the kind had existed in the college he would have preferred founding a 
debating society rather than a classical orchestra. i i 

The main objects of scientific research were not, according to Swift, the discovery 
of pure truth rather than its practical applic 
universe had evolved to its present state in 
thus doing away with the necessity of assu 
modern developments a sign of a gre 
by any previous race. 

Swift thought a good government, first of all, should aim 
principles into its policies and diplomacy, and last of all at establishing a position 
of prestige and respect among nations. Between 


each of these extremes lay the aims 
of providing more aid for the poor, sick, and old; and the developing of manufactur- 
ing and trade. 


If Swift could have influenced the educational policies of schools he would have 
undertaken, first of all, to develop cooperativen| 


to promote the study and performance of drama; and last of all to provide additional 
laboratory facilities. 


Swift’s excess income first was use 
pitals; secondarily it would have bee: 
of all, would he have given it to a university for scientific research. He would have 
preferred establishing a mental hygiene clinic to aiming at a seat in the Cabinet, 
and the latter to entering into banking and finance, and this to making a collection 
of sculptures or paintings, 

Swift preferred for personal friends t 
secondly those of refinement and em 
interested in thinking out their attitu 
who were efficient, industrious, 
sions with these friends he preferred talking 
socialism and social amelioration; about this 
life; and about the latter rather tha 

Swift preferred being a clergyma 
sales manager; and this to being a 
his conduct in accord with his religio 


ations; he did not believe that the 
accord with mechanistic principles, 
ming a God; he did not believe our 
ater degree of civilization than those attained 


at introducing ethical 


ess of spirit and service; secondarily, 


d to endow his church and to establish hos- 
n applied to industrial development; and last 


hose who possessed 
otional sensitivity; 

de toward life 
and of a practic: 


qualities of leadership; 
thirdly, those seriously 
as a whole; and lastly, those 
al turn of mind. At evening discus- 
about literature rather than about 
rather than about the meaning of 
n about Philosophy and psychology, 

n to being a politician; this į r 
mathematician. He thought 
us faith rather than in accord 


n turn to being 4 
one should guide 
with his business 
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organization and associates; in accord with the latter rather than in accord with 
the ideals of beauty} and in accord with these rather than in accord with the pre- 
cepts of society as a whole. 


We must remember that this qualitative description of Swift is 
based solely upon the alternatives suggested by the Study of Values. 
Answers said to be characteristic of Swift are characteristic of him 
only in reference to the rejected alternative or alternatives. Alterna- 
tives not suggested by the Study of Values might easily have been 
possible. 

Relation to the Strong Vocational Interest Test. We have seen 
by the data in Table 92 that students of literature get high aesthetic 
scores that law students get high political scores, that commerce 
students get high economic scores, and that engineering and science 
students get high theoretical scores. Stone has supplemented these 
results by reporting practically identical findings. His only addition 
is that medical students get high theoretical scores. These results 
suggest a degree of parallelism between the areas of personality 
probed by the Study of Values and by the Strong Vocational Interest 
Test. This parallelism has not gone unnoticed and has been examined 
by Sarbin and Berdie, by Duffy and Crissy, and by Ferguson, 
Humphreys, and Frances W. Strong. 

Ferguson, Humphreys, and Strong g 
Strong Vocational Interest Test and the Study of Values, 
Stanford University students. They scored the Strong Vocational 
Interest Test on eight scales: teacher, office worker, life insurance 
salesman, certified public accountant, physician, lawyer, Y.M.C.A. 
secretary, and chemist, and then correlated these scales with each 
other as well as with all six scales on the Study of Values. These 
intercorrelations are presented in Table 93. 


Humphreys, and Strong subjected the intercor- 


Next Ferguson, I : : 
relations in Table 93 to a centroid factor analysis and interpreted 


the factors they extracted as follows: 


ave both tests, that is, the 
to 93 male 


Factor I. Strong scales for lawyer (.77), physician (51), and office worker (—.78); 
and Allport-Vernon scales for aesthetic (-38), theoretic (.35), and economic Sate 
Factor II. Strong scales for teacher (.76), Y.M.C.A. secretary (.79), certifie 

public accountant (.53), office worker (.56), and physician (—.40). oo 
Factor III. Strong scales for chemist (.95), physician (.65), teacher (.3 My awyer 
(—.40), and life insurance salesman (—.93), and Allport-Vernon scales for theoretic 


(59) and political (—.58). 
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' Factor IV. Allport-Vernon scales for religious (.89), social (.47), and aesthetic 
—.47). . 
a V. Allport-Vernon scales for political (.72) and economic (—.60). 

Ferguson, Humphreys, and Strong state that we cannot argue 
“from the above results that the Allport-Vernon Study of Values 
contains factors which the Vocational Interest Test does not, for 
all of the scales from the latter test were not included in the analysis. 
Since Factor I, on the other hand, is described entirely in terms of 
it can be asserted that the Vocational Interest Test 
measures at least one factor that the Study of Values does not.” 
Finally, these investigators conclude that “tests for any one of 
the . . . five factors which they isolated should be more significant 
in the analysis of behavior than any one of the Strong or Allport- 
Vernon scales by itself.” Unfortunately these authors have not 
followed up this suggestion. 

Duffy and W. J. E. Crissy gave the Allport-Vernon Study of 
Values and the Strong Vocational Interest Test to 108 women 
freshmen at Sarah Lawrence College. They scored the Vocational 
Interest Test on the scales for nurse, lawyer, librarian, secretary- 
stenographer, physician, artist, author, housewife, office worker, and 
social worker, and correlated the scores on these scales with those 
on the Study of Values. Their results are presented in Table 94. They 
subjected these correlations to a centroid factor analysis and, after 
rotations had been made, secured the following factor interpretations. 


Strong scales, 


Study of Values and Certain of 


Tanie 94. Intercorrelations between the Scores on the 
Women* : 


the Scores on the Strong Vocational Interest Blank for 


Scale Theoretic | Economic} Political | Aesthetic Social | Religious 

INCRE. cigars arcsec nt —.02 | —.06 11 | —.24 .28 .00 
Lawyer.. | 2 25 124 | —.36 ‘00 | —.25 
Librarian. auis ooste e | -2 | 0 36 | —.32 D 
Secretary... et myi .21 .32 —.19 a = be 
Physician., a ag | 206 | elt | HBG) Se i 
T E S .08 —.31 —.22 45 —.35 2 
A TAT ae ii =G | S .37 E k 
Housewife... wf —-32 M -14 —.17 å 4 
Off k — 6 aT 34 — .32 19 —.17 

ce worker... u.. ceee . 3 F Fi B 
Social worker. ...... s.. a5 —.11 —.0 -. Š d 


i i lated to vocational inter- 
“F „E ssy, W. J. E. Evaluative attitudes as re! to 
tom Duffy, E., and Crissy, W- f Psychol., 1940, 35, 226-245. 


ests and academic achievement. 7. abnorm. sot. 
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On Factor I scales for political (.65), economic (.64), and lawyer 
interest (.36) are opposed to scales for religious (—.51), aesthetic 
(—.40), and author interest (—.33). Duffy and Crissy call this a 
Philistine factor because of the association of economic, political, 
and lawyer interest values. ; ; 

Duffy and Crissy’s second factor is described by scales for social 
(.60), housewife (.38), and secretary (.30) which are opposed to the 
theoretic scale (—.38). This factor Duffy and Crissy call an “inter- 
est in people.” ; 

Finally, Factor III contrasts the theoretic scale (.55) with the 
religious scale (—.48). Duffy and Crissy call this a “ theoretic” factor. 

Scores from both the Study of Values and the Vocational Interest 
Test are represented on two of the factors, but the third factor is 
defined entirely in terms of scales from the Study of Values. But 
just as in Ferguson, Humphreys, and Strong’ 
conclude that the Study of Values measures 
Vocational Interest Test does not measure. T} 
and Crissy, like Ferguson, H 
all of the scales on the V 


s study, we cannot 
anything which the 
his is because Duffy 


umphreys, and Strong, did not score 
ocational Interest Test. 


Sarbin and Berdie gave the Study of Values and the Vocational 
Interest Test to 59 male students at the University of Minnesota 
and treated their data somewhat differently than did Ferguson, 
Humphreys, and Strong, and Duffy and Crissy. Sarbin and Berdie 
scored the Vocational Interest Test on 26 occupations and then 


arranged these according to Strong’s classification (Table 14, Chap. 
2). Then they classified the students as having o 
interests in each group of i 


occupation. Following thi 
puted mean Study of Values score: 
trasting pairs of students, 
significance of the difference: 
means, and secured the resul 

Their data indic 


r not having the 
an in each specific 
and Berdie com- 
s for each of their several con- 
indicate the 
asting pairs of 


des are related posi- 
professional occupations and 


lesmen. They indicate that economic 
to the intere 
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“language” occupations, and negatively to the interests of business- 
men. The social attitude appears related positively to the interests 
of salesmen and to “language” occupations but related negatively 
to the interests of businessmen. Political attitudes are related posi- 
tively to the interests of businessmen and are related negatively to 
the interests of scientific and professional men, farmers, etc. Finally 
the religious attitude is related positively to “interest in people” and 
is related negatively to the interests of certified public accountants. 

Table 95 brings together the data for three occupations common to 


both the men’s and women’s Vocational Interest blanks. Duffy and 


Taste 95. Correlations between the Allport-Vernon Study of Values Scales and the 
Lawyer, Physician, and Office Worker Vocational Interest Scores* 


| Lawyer Physician | Office worker 
Fergu- Fergu- Fergu- 
, son P 
Scale a00, Duffy | Sarbin sony Duffy | Sarbin nA Duffy | Sarbin 
Hum- a and cee and and Kor and and 
preys; Crissy | Berdie P a Crissy | Berdie p oe > | Crissy | Berdie 
Strong Strong Strong 
Theoretic... 01 aa | —1.4 59 4 4.1] —.39 | —.26] —1.5 
Economic. .| —.43 “95 | —2.0 | —.49 | — -06 —2.3 51 .37 1.8 
Aesthetic... .12 | —.36 1.8 -18 | —-16 2.4| —.30] —.32 | -1.9 
Social.......... 10 .00 erg .09 | —.08 0.2 | —.08 .19 | —1.6 
Political. ...| —.03 24 1.0) —.32| =N —3.5 19 34 2.0 
Religions -29 | —-25 | —0-8 ‘oz | —.05 | —1.1 | —.04 | —.17] 1-6 


., and Crissy, W- J. E. Evaluative attitudes as related to voca- 
: abnorm. “soc. Psychol., 1940, 35, 226-245; Ferguson, 15, Wo 
Humphreys, L. G., and Strong, F. W., Jr- A factorial analysis of interests and values. J. educ. 
Psychol., 1941, 32, 197-204; and from Sarbin, T. R., and Berdis, R. F. Relation of measured 
interests to the Allport-Vernon Study of Values. J. appl. Psychol., 1940, 24, 287-296. 


* Adapted from: Duffy, F- 
tional interests and values. J- 


Crissy’s data, and Ferguson, Humphreys, and Strong’s data are 
A sa = f i Ea 
given in terms of correlations, while Sarbin and Berdie’s data are 


given in terms of ¢ ratios. In the latter case we must realize, too, that 
the ¢ ratios refer, not to the specific occupation alone, but to the 


group of occupations in which each is included. satel 
All three sets of data show that the theoretical attitude is re T 
positively to physician interest and related negatively to office 
worker interest. They also agree 1n showing that the economic kra 
tude is related positively to office worker interest and negatively to 
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physician interest. All three studies agree in showing the aesthetic 
attitude to be related negatively to office worker interest. Finally, 
they agree in showing the political attitude to be related positively 
to office worker interest and related negatively to physician interest. 

For men the economic attitude appears to be related negatively to 
lawyer interest, the aesthetic attitude appears to be related posi- 
tively to lawyer interest and to physician interest, and the religious 
attitude appears to be related negatively to lawyer interest. Either 
these same trends do not hold for women, or Duffy and Crissy’s data 
are in disagreement with those of Ferguson, Humphreys, and Strong 
and with those of Sarbin and Berdie. 

We can conclude by saying that the agreements we have been 
discussing are interesting and suggest a basic parallelism between 
the areas of personality probed by the Strong Vocational Interest 


Test and by the Study of Values. But the disagreements indicate * 


that further research is in order before we can make completely 
+ definitive the relations we have been examining. 


8 


ADJUSTMENT: DIAGNOSTIC APPROACHES 


One of our most important daily tasks is that of making adjust- 
ments. We must adjust ourselves to the other members of our family, 
to our neighbors, to our friends, to our school chums, to our business 
associates, and so forth. And, of course, we expect the other members 
of our family, our neighbors, our friends, our school chums, and our 
business associates to make adjustments in our behalf, also. We 
already know from the doctrine of individual differences that a few 
people will be very adept at making these adjustments, a few will 
seem incapable of making any degree of adjustment whatsoever, 
and the rest of us will fall somewhere between these two extremes. 

The task which confronts us in this and in the next chapter is that 
of describing some of the methods devised to measure the degree of 
adjustment which an individual can make. We shall find it con- 
venient to discuss these methods under two general categories: 
diagnostic approaches and prognostic approaches. Under the latter 
heading we shall discuss the devices measuring adjustment in rela- 
tion to a specific object and for which a definite prediction of adjust- 
ment in a specified situation 1s desired. For example, under this 


heading we shall discuss adjustment in marriage, vocational adjust- 
ilitary service. We could discuss other 


ment, and adjustment in mi > c 
Specific situations also, but these three will suffice to illustrate the 
types of methodology which are generally involved. f 

Under the heading of diagnostic approaches, the subject of our 
present chapter, we shall discuss adjustment in general. We shall 
be concerned with the internal (if we can call it that) emotional 
adjustment of the individual. Is he inwardly happy? Is he at har- 
mony with himself, or is there some basic conflict that keeps him 
in an inner turmoil? We shall be concerned with the diagnosis of 
the presence or absence of such adjustment, for we can see, without 
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Pi H 4 » 
elaboration, that such general adjustment is basic to a person’s 
adjusting himself in specific situations. We shall discuss two of the 
most widely used general adjustment inventories. These are the 
Bell Adjustment Inventory and the Minnesota Multiphasic Per- 
sonality Inventory. 


THE BELL ADJUSTMENT INVENTORY 


According to Bell, the first list of adjustment questions appearing 
in psychological literature is that prepared in 1905 by Heymans and 
Wiersma. It was prepared for, and used by, 400 Dutch physicians 
in determining the maladjustments of 2,415 of their patients. Some 
of the questions in this list, as given by Bell, are presented in Table 


96. 


TABLE 96. Some of the Questions Used by Heymans and Wiersma to Determine 
Maladjustment* 


Is the particular person active and busy (gesticulating, 
back and forth in the room), or sitting quiet? 

Is the particular person discouraged easily in disagreeable t 
completion of his intentions? 


Is the particular person impulsive (grasps a situation on the impulse of the moment), does he 
think it over and not act without deliberation of pro and con, or is he a man of principles 
acting only in accordance with rules of conduct already established? 

Is the particular person excitable (that is, is he moody over little things and easily hurt) or is 
he of a happy mood (that is, moves easily among his fellows, is at ease among others); or is 
he absolutely incapable of being aroused to anger (cannot be hurt, does not let himself be 
aroused) ? 

Is the particular person happy and joyful (that is, enjoys life), is he of he: 
Pressed, does he fluctuate from one to the other, or is he 

Is the particular person inclined to sink himself in 

Is the particular person a lover of intellectu: 

Is the particular person inclined to talk 

Is the particular person domincering, 
easy to guide and control? 


jumping lightly from the chair, walking 


asks or is he persevering in the 


avy mood and de- 
always quiet and the same? 
abstract speculations? 


al games, Puzzles, et cetera? 
about things, Persons, or about himself? 
or inclined to let everyone have his freedom, or is he 


* From Bell, H. M. The Theory and Practice of Personal Counseling, Stanford University, 
Calif.: Stanford University Press, 1939, 


greater emphasis upon sexual adjus 
their list are given in Table Gi, 

The third list of note is that Prepared by F. L. Wells in 1914. This 
t Wells based directly upon those Prepared by Heymans and 
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Wiersma, and by Hoch and Amsden, and he also drew to some extent 
from some personal trait lists prepared by Cattell and by Davenport 

The next step in this field was that undertaken by Woodsorth 
when he prepared his Personal Data Sheet. We have already dis- 
cussed this (in Chap. 6) and have given the list of questions which 
Woodworth prepared. We have also indicated that there have been 
many revisions of Woodworth’s list and that his list can be con- 
sidered the grandfather of practically all personality and adjustment 
schedules now in existence. We shall find that the Bell Adjustment 
Inventory is no exception to this and that it is a direct descendant of 
the Woodworth list through just one intermediary, the Thurstones’ 


Personality Schedule. 


Tare 97. Some of the Questions Used by Hoch and Amsden to Determine 
Maladjustment* 


In childhood was he active in play and work? 

Does he make friends easily ? 

As a child did he play freely with other children? 
Does he daydream frequently? 

Does he think the world treats him ill? 

Does he readily adapt himself to a new environment? 


Does he get despondent without apparent reason? 
Is he up and down in his moods? 


Is he irritable, quick-tempered? 
Does he have a marked preference or antagonism for any member of his family? 


Is he natural and at ease with the opposite sex? 


Does he read much? 


* From Bell, H. M. The Theory and Practice of Personal Counseling. Stanford University, 


Calif.: Stanford University Press, 1939. 

Development of Test. In 1930 Bell gave the Thurstones’ Per- 
sonality Schedule to entering freshmen students at Chico (Cali- 
fornia) State College. He interviewed these students during the 
course of the school year and found that the test did not tap, as well 
as he thought it ought to tap, all areas of student maladjustment. 
Therefore Te began a revision of the Thurstones’ schedule, and this 
ultimately led to Bell’s own Adjustment Inventory. 

The first step which Bell took was to classify the items in the 
Thurstones’ schedule into various groups: These groups were: 


Attitude toward others 


Home life 1 y 

Health Social reaction 

Use of time Attitude toward life 
Attitude toward sex 


Emotional control 


Self-feeling Pathological tendencies 
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Bell next constructed “a differential scoring stencil” and used this 
stencil in scoring the Thurstones’ schedule for the entering freshman 
class of 1931. Bell reports that this materially increased the useful- 
ness of the schedule. 

Bell’s third step was to devise 188 new questions and to add these 
to the 223 already in the Thurstones’ schedule. Bell, with the as- 
sistance of Dr. C. Gilbert Wrenn, prepared an a priori scoring key 
and used this key to derive an adjustment score based upon all 411 
questions. This key was validated, says Bell, “through subsequent 
use and analysis of the test.” But Bell gives no other details. 

As the fourth step in the development of the Adjustment Inven- 
tory, Bell performed an item analysis. This analysis was performed 
Separately for each of the 10 adjustment areas previously defined. 
Bell selected for his criterion groups the upper and lower 15 per 
cent of the score distributions. Items were not retained if they failed 
to differentiate between these two criterion groups. 

Bell next applied what he called a criterion of applicability. He 
wanted items that would apply toa large number of students and so 
eliminated any item which did not apply to 25 per cent or more of 
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Reliability. The reliabilities of the scores on the Adjustment 
Inventory range between .80 and .93. These were determined by the 
split-half technique and are given in Table 98. 

Taste 98. Reliability Data for Bell's Adjustment Inventory* 


Home adjustment............-- -89 
Health adjustment. . -80 
Social adjustment. . . . < 389 
Emotional adjustment 85 
Total adjustments....0-6. 0c + 93 
* From Bell, H. M. The Theory and Practice of Personal Counseling. Stanford University, 


Calif.: Stanford University Press, 1939. 

Validity. Bell states that we may consider the matter of student 
adjustment from two points of view: from the standpoint of the 
Individual himself or from the standpoint of a disinterested party. 
The point in making this distinction is that a person may feel him- 
self perfectly well adjusted, but a disinterested party might think 
otherwise. Or a disinterested party might consider a subject well 
adjusted when the individual himself thinks he is maladjusted. 
Bell’s interest was in the viewpoint of the individual himself—in 
the individual’s evaluation of his own behavior. The Adjustment 
Inventory is supposed to “get at” this aspect of adjustment. We 
must keep this point clearly in mind as we review the data Bell 
offers as relevant for determining the validity of his Adjustment 


Inventory. i , f 
The first bit of evidence Bell offers is that the items in the Adjust- 


ment Inventory show a significant differentiation between the 
Students falling in the upper and lower 15 per cent segments of the 
Score distributions. We cannot accept this as evidence for validity. 
In the first place, the criterion groups were themselves selected 
upon the basis of the items whose validity is at stake, and in the 
second place, the results are pertinent only to item reliability, not to 


item validity. 


Bell’s second type of validation data lies in his interview material. 
€ states that item responses were found to be consistent with 
remarks made during these interviews. This may well be true, but 


it would seem that all that this proves is that a person will put down 
s he is willing to divulge in an interview. 


On paper the same thing | 
i asic question as to whether these re- 


his does not answer the b c í 
Sponses are really indicative of adjustment, or of maladjustment, as 


the case may be. 
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Bell’s third type of validating data consists of the correlations of 
the scores on his inventory with other measures of adjustment. He 
submits correlations with the scores on the Thurstones’ Personality 
Schedule, the Allports’ Ascendance-Submission Reaction Study, 
and Bernreuter’s Personality Inventory. These correlations are 


presented in Table 99, 


TABLE 99. Correlations between Certain Bell Adjustment Scores and Scores on Other 
Tnventories* 


Scales intercorrelated | N | r 
= = _ = | EE 
Social adjustment vs. Allports’ Ascendance-Submission score (men). 20... 46 | 58 
Social adjustment vs. Allports’ Ascendance-Submission score (women)...... 50 67 
Emotional adjustment vs. Thurstones’ Personality Schedule.............,. 96 83 
Total adjustment vs. Thurstones' Personality Schedule... . 9% | .89 
Social adjustment vs. Bernreuter's B4-D scale 39 79 


* From Bell, H. M. The Theory and Practice of Personal Counseling. Stanford University, 
Calif.: Stanford University Press, 1939, 

These correlations are highly questionable 
They involve the assumption that the 
and Bernreuter’s inventories are, them 


maladjustment. And of this assumption, as we pointed out in Chap. 
6, there is no positive proof. Next, m 


any of the questions in Bell’s 
inventory were taken from the Thurstones’ schedule. And both 


Bernreuter and the Thurstones took many of their items from the 
Allports’ Ascendance-Submission Reaction Study. Thus Bell, in 
reporting the correlations in Table 99, is not reporting correlations 
with independent measures of adjustment. This makes these correla- 
Hons spuriously high. They cannot be taken as indicating validity 
in the adjustment scores. If the Bell Adjustment Inventory is a valid 
test, we must look elsewhere for supporting data. 

The last type of validity data which Bell presents is to the effect 
that the scores on his inventory differentiate between counselor- 


poorly adjusted students, Bell 


as indices of validity. 

j a , 
Allports’, the Thurstones’, 
selves, valid indicators of 


differentiating scores on the Adjustm 
in connection with each scale were lo 
cities: Chico, California (home and 

Heights, New Jersey (home and health adj 
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California (health adjustment); Sacramento, California (social 
adjustment); and Pasadena, California (emotional adjustment). 
Clearly, the students used in connection with the social adjustment 
and emotional adjustment scales are different from each other, and 
both are different from those used in connection with the home 
adjustment and health adjustment scales. Bell is not entirely clear, 
however, as to whether there is or is not any overlap in the student 
groups used in connection with the home adjustment and health 
adjustment scales. The adjustment scores for these various groups 
of adjusted and maladjusted students are given in Table 100. 


Tante 100. Bell Adjustment Scores for Counselor-selected Adjusted and Maladjusted 
Students* 


| s PE 
Standard deviation 


| 
Adjusted | Maladjusted | Diference | oF vhe difference 


Scale Number 
Home adjustment... . | S | 4.65 10.27 5.62 .80 
Health adjustment. . 5.40 11.53 6.13 93 
Social adjustment. .....| 24 | 8-40 16.80 8.40 1.52 
Emotional adjustment.. 36 8.28 16.28 | 7.50 1.42 


* From Bell, H. M. The Theory and Practice of Personal Counseling. Stanford University, 
Calif.: Stanford University Press, 1939. 


These data are interesting and indicate, without doubt, that 
Bell’s Adjustment Inventory can differentiate between counselor- 
selected groups of well-adjusted and counselor-selected groups of 
poorly adjusted students. These would seem to be evidence for 
validity until we recall Bell’s assertion that his test was designed 
to measure adjustment or maladjustment from a student s point of 
view and not necessarily from a counselor’s point of view. Therefore 
we must reject the data in Table 100 as irrelevant to the hypothesis 
to be tested. We have now exhausted the evidence which Bell him- 
self has presented, and we must conclude that the validity of g sat 
Justment Inventory has in no way been substantially demonstrated. 


THE MINNESOTA MULTIPHASIC PERSONALITY INVENTORY 


We have discovered throughout this volume that one of the most 
Popular methods of test construction is that which relies upon n 
method of internal consistency. The second most popular metho 
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is that which utilizes the scores of other tests as a criterion. One 
of the least used methods of personality-test construction is that of 
using criterion groups which have been selected independently of 
the test to be validated. So far in this volume we have run into this 
technique only twice—in our discussion of the Strong Vocational 
Interest Test and in our discussion of the Terman-Miles M 
Femininity Test. We are now to see it used again, however, in the 
development of the Minnesota Multiphasic Personality Inventory. 
The authors of this inventory, Hathaway and McKinley, have 
developed several adjustment questionnaires based directly upon 


the responses of preselected and carefully: defined groups of adjusted 
and maladjusted individuals. 


Development of the Inventory. 
ventory Hathaway and McKinley’s 
gators, was that of collecti 
for items their goals were to secure a 


asculinity- 


s used in case studies, 
personality and adjustment 
rative statements in the first 
1 n cts listed in Table 101. 
These items Hathaway and McKinley printed in two forms. In 
et with instructions for giving 
the answers on an IBM answer sheet. In the other form, and this is 
the original form, the i ed individually ia large type 
on 3- by 5-inch cards. These cards are divided about equally into 
two sections, is distinctively marked with a 
colored stripe along the top edge. The 


5 edge. two groups of cards are housed 
p appropriate file boxes with three guide cards marked: True 
> “Cannot Say.” After a subject has been tested and after 


cards are shu 
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Tase 101. Classification of Items in the Minnesota Multiphasic Personality 


Inventory* 
: Number ¥ Number 
Topit of items Topig of items 
1. General health. cneasa 9 15. Religious attitudes............ 20 
2. General neurologic...........+ 19 16. Political attitudes (law and 
3, Cranial NEVES oae vsio sa 11 a aN) scents thanks. ste ANE S 46 
4. Motility and coordination... . . 17; Soctalattitidesc. aaa 72 
Ds SGB a He Be 5 18. Affect, depressive o| 32 
6. Vasomotor, trophic speech, se- 19; Affect ornai 24 
CROCS eraann e 10 20. Obsessive, compulsion......... 15 
7. Cardiorespiratory... 5 21. Delusions, hallucinations, illu- 
8. Gastro intestinal 11 sions, ideas of reference....... 31 
9, Genito-urinary. 6 22. PROBES e ssassn iia aia 29 
10, Babia cco 20 |23. Sadistic. acl) ae 
11. Family and marital. 29 | 24. Morale.. 33 
12. Occupational....... 18 25. Is individual trying to place 
13.. Educational... 12 himself in improbably accept- 
‘de 19 able or unacceptable light..... 15 


l4. Sexual attitudes. ... -0.0004 


* From Hathaway, S. R., and McKinley, J. C. A multiphasic personality schedule: I. Con- 
struction of the schedule. 7. Psychol., 1940, 10, 249-254. 


of item position and to facilitate the addition or deletion of items to 
or from any scale. The directions for taking the test are as follows: 


Take the cards out from the front, one at a time and decide whether each is true 


or not, ; 

Tf it is mostly true about you, put it behind the card that says TRUE. x 

If it is nor mostly true about you, put it dehind the card that Saye FALSE, 

Tf a statement does not apply to you, oF Is something that you don’t know about, 
Put it Zehind the card that says CANNOT SAY. 

There are no right or wrong answers. 

Remember to give your opinion of yourself. 

There are two boxes in this set. 

In order that we may use your results, 


To date, Hathaway and McKinley k ale 
designed “to assay those traits that are commonly c pease 
of disabling psychological abnormality, a scale for masculinity- 
femininity, and four special validating scales. Of pone pe 
at this point are the eight diagnostic scales. These are for the meas- 


urement of hypochondriasis, symptomatic depre appo en 
ysteria, hypomania, psychopathic deviate, paranoia, and sc ee 
Phrenia. Each of these scales has been based upon a comparison o 


both boxes must be completed. 


have developed eight scales 
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item responses between a clinically diagnosed group and a control 
group of normals. We shall discuss these criterion groups in greater 
detail later, but here we can point out that they represent as care- 
fully selected criterion groups as have been available for the purposes 
of personality-test construction. All abnormal criterion groups 
consist of hospital patients and fall clearly into the classification or 
clinical syndrome involved. Diagnoses have been made directly by a 
clinician from a mimeographed symptomatic tabulation sheet or 
have been made following a review of the records supplied by a 
neuropsychiatric hospital staff. 

Normal controls have been drawn from a variety of sources, but 
the great majority have been visitors at the hospital and high-school 
graduates tested at the University of Minnesota testing bureau. 
They have also included nonpsychiatric patients in the general 
ward of the hospital. 

Hypochondriasis. The first scale which Hathaway and McKinley 
developed was for hypochondriasis, that is, “abnormal, psycho- 
neurotic concern over bodily health.” The criterion groups were 50 
clinically selected hypochondriac patients (abnormal group) and 
two groups of normal controls. These latter groups consisted of 109 
adult married males and 153 adult married females (first normal 
control group) and 265 University of Minnesota freshmen, the great 
majority of whom were adolescent and single (second normal control 
group). 

Hathaway and McKinley tabulated the responses given by these 
subjects and determined the percentages of the normal and abnor- 


mal groups giving each possible answer to each one of the 504 items- 
Then they determined the differences 


between the perc for 
centages 
the normal and abnormal 5 


h i groups, computed the standard errors of 
these differences, and determined a critical ratio for each item. Then 


Hathaway and McKinley eliminated all items which failed 
difference twice as large as its own standa 
nated items for other reasons, one of thes 
the item occurred with greater frequency i 
than in the hypochondriac group. In oth 
seemed to be more closely related to a difference in marital status 
or in attitude toward children than to the clinically diagnosed dis- 
order of the abnormal criterion group. As might be surmised, Hath- 
away and McKinley had to revise their scale a number of times 


to yield a 
rd error. They also elimi- 
e other reasons being that 
n the college normal group 
er instances, the difference 
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before they felt it to be maximally effective. In these revisions they 
tried various methods of weighting the items, some methods being 
based upon clinical judgments of importance and other methods 
being based upon the reliability of the differences between the 
None of the weighting systems of scoring were 


criterion groups. 
ighting system, so the latter 


found to be better than a simple unit-we 
was adopted. As a result of these procedures, 55 items were selected 
to comprise the scale for hypochondriasis. 

The next step undertaken by Hathaway and McKinley was that 


of scoring all hospitalized psychiatric cases on the Hypochondriasis 


scale. Hypochondriacs received, as they should, significant scores. 


And normals received, as they should, nonsignificant scores. But a 
large group of psychiatric patients, not hypochondriacs, received 
high scores. To correct this Hathaway and McKinley selected from 
the nonhypochondriacs the 50 cases who on the Hypochondriasis * 
scale received the highest scores. They compared the responses of 
this group with those of the 50 clinically diagnosed hypochondriacs 
and prepared a new scale differentiating these two groups from each 
other. In selecting items for this scale, those that were already 
included in the Hypochondriasis scale were omitted, but from the 
remainder Hathaway and McKinley retained all items yielding 
significant differences. Forty-eight items met the criterion of signif- 
icance and were included in the Hypochondriasis Correction scale. 

To determine the final hypochondriacal score, it is necessary to 
use both the Hypochondriasis and Hypochondriasis Correction 
scales and to subtract the score on the latter from the score on the 
former. This score is designated the H — Cn score. What this correc- 
tion accomplishes is illustrated in Table 102. o. 

The normal control group and the hypochondriac criterion groups 


a a is differentiation 
are clearly segregated as they should. be, and this different 
oF as shown in the cross-validating groups. Note, 


hypochondriac groups who secure high scores 


on the Hypochondriasis scale receive much lower scores on - 
H — C, scale. Yet the differentiation on this scale between the 
two original criterion groups and the two new ios grape groups 
is just as satisfactory as On the Hypochondriasis sca pro i s 
clearly worth while, argue Hathaway and McKinley = | in vey 
of these data we have no reason to doubt them), to make this correc- 
tion and to use the Æ — Cn score as the measure of hypochondriasis. 


holds up on new cases, 
however, that the non 
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The logic involved in correcting the original scale by means of a 
second scale is simple. Psychiatric cases differ from normals, but 
also differentially diagnosed groups of psychiatric cases differ among 
themselves. The Hypochondriasis scale differentiates hypochondriacs 
from normals, but the H — Ch scale is needed to differentiate hypo- 
chondriacs from nonhypochondriacal psychiatric patients. 


Taste 102. Validity Data for Hathaway and McKinley's Hypochondriasis Scales* 


7 seal 
Hypo- | Critical a Critica 
Croup Normal chondriac | ratio m=i ratiot 
Normal married males................. 123 10.9 . 
Hypochondriacs (critical cases).......... 50 29.1 15.4 14.3 14. 
A N E E zl S 26.2 13.0 0.3 4.5 
Cross validating groups: 
Hypochondriacs..............2000005 25 29.1 13.0 14.0 13.1 
Symptomatic hypochondriacs. sandli 28 20.9 5.0 3.6 5.1 
High hypochondriasis score without 
symptomatic hypochondriasis....... 17 22.3 10.4 1.2 3.6 


* From McKinley, J. C., and Hathaway, S. R. A multiphasic personality schedule: H. A 
differential study of hypochondriasis. Y. Psychol., 1940, 10, 255-268. 
f Each group compared with normal controls, 


Symptomatic Depression. The second scale which McKinley and 
Hathaway developed was one designed to measure symptomatic 
depression. Symptomatic depression is, according to Hathaway 
and McKinley, “a clinically recognizable general frame of mind 
characterized by a poor morale, lack of hope in the future, and dis- 
satisfaction with the patient’s own status generally.” 

In the development of the Depression scale, Hathaway and 
McKinley utilized the responses of the following criterion groups: 


1. 139 normal married males, age 26 to 43 


2. 200 normal married females, age 26 to 43 
3. 256 college students 


4. 50 carefully selected depressed patients (in the depressed phase of manic- 
depressive psychosis) 


The item responses of the depressed cases were compared with 
those of the normal controls, and, as in the development of the 
Hypochondriasis scale, items with a difference twice their standard 
error were retained. Items showing sex differences were eliminated, 
however, and this left 70 items. This scale of 70 items was found tO 


No question that it di 
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make significant differentiation between normal and depressed cases, 
but some patients not depressed received high depression scores. 
Therefore two additional criterion groups were selected for study. 
One of these groups consisted of 50 nondepressed patients with high 


depression scores, and the other group consisted of 40 depressed 


normals. Item responses for these groups were tabulated. Then, to 


be included in the final Depression scale, an item had to show a 
ase in frequency from normal through depressed 
and the percentage for the non- 
approach that for normals. Sixty 
nstitute the Depression 


ble 103. There is clearly 


progressive incre 
normal to depressed psychotics, 
depressed patients was required to 
items met these criteria and they now co 
scale. Data on its validity are set forth in Ta 


Taste 103. Validity Data for Hathaway and McKinley’s Depression Scale* 


| 
| Critical 
Group Number | Mean | ratiot 
Spin: aaah oracle EMR so | 36.68 | 26.9 
Criterion...-.+-++++* 3 a ee 


Test depressed. .-..-+ +0005 7"" m 
5 28.8 
Non-depressed. . - - - - its a “a a teg 
atically depressed. - 3 8.2 f 
Symptomatically depres m AET 


Random psychiatric cases... ee i= 
Physically ill A | = n 8.9 
A TE O O MEE | § 


* From Hathaway, S. R., and McKinley, J. C. A multiphasic personality schedule: III. The 
Measurement of symptomatic depression. 7. Psychol., 1942, 14, 73-84. 
t Each group compared with normal controls. 


fferentiates between normal and depressed 


Individuals. r scied W. 
Psychasthenia. This term signifies individuals whose thinking 
is characterized by excessive doubt, by compulsions, obsessions, and 


Unreasonable fears; by great doubts as to the meaning of his 
yo =} a ’ ee 

reactions in what seems to be a hostile environment”; and by “a 

weakened will that cannot resist the behavior regardless of its 


maladaptive character.” 4 : 

The criterion groups which Hathaway and McKinley used in the 
development of this scale were as follows: 

1. 139 normal married males, age 26 to 43, and 200 normal married females, 
age 26 to 43 

2. 20 psychasthenia patients 


240 Personality Measurement 


Hathaway and McKinley constructed a preliminary scale by 
selecting all items that yielded differences with critical ratios of 2.0 
or more between the criterion groups. Then they computed tetra- 
choric correlations between the responses for each item and total 


Taste 104. Validity Data for Hathaway and McKinley's Psychasthenia Scale* 


Percentage equal to 
or greater than 
mean of normals 


Standa Critical 
Group Number| Mean Suarigaka e a 
deviation| ratio 


Normals... 690 11.70 Wai 

Criterion cases... = 20 27.05 9.4 7.2 95 
Symptomatic psychiatric... . 50 21.02 9.1 7 90 
Other psychiatric. 576 16.15 10.1 8.6 63 
Physically ill,.... wel 266 12.12 29 0. 48 
College students............. 270 7.99 6.0 7.9 28 


*From McKinley, J. C., and Hathaway, S. R. A multiphasic personality schedule: IV. 
Psychasthenia. Y. appl. Psychol., 1942, 26, 614-624. 


Taste 105. Intercorrelational Data for the Minnesota Multiphasic Personality 


Inventory* 
100 normals 100 psychopathics 
Group Psycho- Psycho- 
Hysteria} Hijo pathic Hivaverial Hypo- pathic 
mania | desiate | mania | deyiate 
= 2 Sian || NEN, 
Psychopathic deviate........ : -37 49 | i -18 43 
Hysteria... fa saint Al, snags Od BR | eae i me 13 18 
Depression. 255 — .02 | 29 68 | —.21 lt 
Hypochondriasis, 52 -28 | 42 A| = “08 37 
Psychasthenia AG 5k, 39 | 48 -33 14 23 
Paranoia, . Nadas ts At 30| 38 40 31 40 
Hypomania... 105) sn 9 aa 45 
Schizophrenia... .28 60 23 a 36 31 
*From McKinley, J. C., and H; 


athaway, S. R. The MM 


psychopathic deviate. 7. appl. Psychol., 1944, 28, 153-174, 


PI: V. Hysteria, hypomania, and 
scores for 100 normals and for 100 ‘ 


7 —, 
. i randomly selected psychiatric 
patients.” Then the i psy 


heir final scale all items that 
3 ) one or the other or in both © 
t ese groups. Forty-eight items met their test, and these items noW 
constitute the Psychasthenia scale. 
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Data relevant to the validity of the Psychasthenia scale are pre- 
sented in Table 104. The number of psychiatrically diagnosed 
psychasthenia patients is small, and no cross-validating group was 
available. 

Hysteria, Hypomania, and Psychopathic Deviate. Scales for hys- 
teria, hypomania, and psychopathic deviate were developed in much 
the same fashion as those already described, so we need not discuss 
their detailed development. Their intercorrelations with the other 
scales on the Minnesota Multiphasic Personality Inventory are set 


forth in Table 105. 


SPECIAL VALIDATING SCALES 


We have commented before upon the problem of getting honest, 
htforward responses to the items in a personality 
asic Personality Inventory is, of course, 
not immune from this problem. Hathaway and Mckinley have been 
well aware of this fact and have attempted to mitigate the serious- 
ness of the problem by developing four special validating scales. 
These are the Question score (2), the Lie score (L), the Validity 
Score (F), and the Correction score (K). The development of these 
scales has been rather different from that of the diagnostic scales we 
have already discussed, so we shall have to give them separate 


treatment. 
The Question Score. I 


objective, and straig 
test. The Minnesota Multiph 


“his consists of nothing more than the total 


number of items classified in the “cannot say” category. The average 
Subject will place 30 or fewer items in this category. When the 
number of items so classified exceeds 30, Hathaway and McKinley 
feel that the scores on the various diagnostic scales are automatically 
lowered. And if the number of items classified as “cannot say” is 
130 or more, Hathaway and McKinley feel that all of the diagnostic 
Scores must be considered invalid. Hathaway and McKinley s only 
Validation for the Question score consists in their own clinical 
Judgment that subjects with high Question scores do not appear to 
ave responded to the test items in a properly objective a 
The Lie Score. This score “affords,” say Hathaway and Mc- 
Kinley, “a measure of the degree to which the subject sa Ph 
attempting to falsify his score by always choosing the — that 
Places him in the most acceptable light socially. High Lie scores, 
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according to Hathaway and McKinley, indicate that the scores 
on the diagnostic scales are too low. 

The Lie scale consists of 15 items on which a completely honest 
person is apt to get a very low score. They are representative of 
socially desirable situations which are rarely apt to be true. There- 
fore, when a subject secures a fairly high score on the 15 items in the 
Lie scale his entire record becomes suspect. Like the Question scale, 
the only validation for the Lie scale lies in Hathaway and Mc- 
Kinley’s clinical judgment. 


The Validity Score. This scale is composed of 64 items which 


are answered infrequently by normal subjects. Therefore a high. 


score is considered as evidence for faking, for careless marking, or as 
evidence that the subject could not understand the meaning of the 
items. A low score is considered as evidence that the scores on the 
diagnostically significant scales are valid. Here again Hathaway and 
McKinley fail to give us supporting evidence. er 

The Correction Score. This scale consists of 30 items. Twenty- 
two of these items Hathaway and McKinley found useful in dif- 
ferentiating clinically diagnosed abnormals with normal profiles 
from normal control groups. The remaining 8 items were included 
because they were found to differentiate depressed or schizoid 
subjects from normal controls, The first of these two groups of items 
taps, Hathaway and McKinley argue, the test-taking attitude of 
trying to look better than one actually is, and the second set of 8 
items taps the test-taking attitude of trying to look worse than one 
actually is. The net score, as a result of the operation of these two 
sets of items, indicates a correction to increase the validity of each 
of the diagnostic scales. This correction varies for the different scales, 
being the full amount of the score in two instances, one-half in one 


instance, four-tenths in another, and two-tenths in another. These 
corrections are added to the sco: i 
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174 normal controls, all with borderline profiles. Then they divided 
the entire group of 511 cases into two groups. This division was 
based upon the median correction score, so that 50 per cent of the 
cases were placed in each of the two groups. This division of cases 
made it possible for Hathaway and McKinley to identify correctly 
more than half of both the normal and abnormal groups. The exact 


percentages of cases correctly identified are as follows: 


72 per cent of the normal men 

59 per cent of the normal women 

61 per cent of the abnormal men 

66 per cent of the abnormal women 

Some of these percentages may not seem high, but they are all 
better than chance and appear to increase by a significant margin 
the accuracy of the diagnoses based upon the scores of the Minnesota 


Multiphasic Personality Inventory. 


9 


ADJUSTMENT: PROGNOSTIC APPROACHES 


Our discussion in the last chapter centered around diagnostic ap- 
proaches to the measurement of adjustment. By diagnostic ap- 
proaches we mean approaches concerned with immediate adjustment 
and adjustment related to the general mental health of an indi- 
vidual. We want to turn our attention in this chapter to prognostic 
approaches to the measurement of adjustment. Prognostic ap- 
proaches are designed to predict adjustment at some future date 
and, usually, in a defined and specific situation. Thus, we shall 
discuss adjustment in marriage, 
occupational calling, a 
be concerned with the prediction of the degree of adjustment in 
these specific situation 
We shall have to con 


> 


be the Aptitude Index 


ce selling. And our fourth exam- 
ple will be Shipley and Graham’s Personal Inventory for predicting 
adjustment in military (naval) service, 


BURGESS AND COTTRELL 


Burgess and Cottrell 
which adjustment in marriage could be 


Adjustment: Prognostic Approaches 245 


established empirically the extent to which each of the personal- 
history and background items, and all of these items together, could 
give forehand knowledge of the score on their marital adjustment 
scale. In Burgess and Cottrell’s own words, they “‘first sought to 
define the problem of marriage adjustment; second, to find what 
factors present at the time of marriage are associated with marital 
success or failure; and third, to determine whether or not it is possi- 
ble to devise a method of predicting before marriage its outcome in 
marital happiness or unhappiness.” 

Burgess and Cottrell began their study by collecting a large num- 
ber of personal-history, personality, background, and marital ad- 
justment items. They divided these items into two sets, one including 
the items of a more personal nature and the other including the 
items of a less personal nature. It is this latter set that chiefly con- 
cerns us, for it was prepared in questionnaire form, and we are to 
discuss the data secured in response to it. 

Subjects. Burgess and Cottrell prepared a preliminary form of 
this questionnaire and tried it out on 100 subjects before it was made 
ready for final use. The questionnaire completed, Burgess and 
Cottrell’s next task was to secure subjects. They located these 
through students, interested individuals, social agencies, the general 
mail, and apartment-house mailboxes. They distributed over 7,000 
copies of their questionnaire and received 1,300 replies. From these 
replies they selected 526 for intensive study. Of these 526 question- 
naires, 153 were completed by the husband, 317 by the wife, 30 were 
completed together, 15 were completed by the wife or husband with 
an interviewer’s assistance, and 11 were completed in an unknown 
way. The questionnaires selected for study had to represent: 


1. Couples resident in Illinois , 
2. Couples married more than one year but not more than six years 


3. Divorced couples married not more than six years 


ted for study 80 per cent represented 


subjects from north European cultural groups, more than half of 
the questionnaires represented Protestant subjects, and more than 
half represented college subjects. The great majority represented 


subjects reared in cities, and 80 per cent represented subjects resident 
in or near Chicago. Husbands came primarily from the white-collar 
and professional groups. About one-third of the couples had one 


Of the questionnaires selec 
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child, and the average length of time married (for all couples) was 

three years. Husbands averaged 26 years in age and wives, 23. k 
Marital Happiness Scale. The criteria which Burgess and Cot- 

trell set forth as indicative of marital adj 


matters, demonstrations of affecti 
‘table manners, matters of conventionality, philosophy of life, and 
ways of dealing with in-laws; the p 
of common interests and 
of affection; few complaints; infrequent or no feelings of loneliness, 
miserliness, irritability, or lack of self-confidence; and an over-all 
feeling of happiness. Altogether 28 items were involved. 

Happiness Ratings. Burgess and C 
basic of these criteria to be the over-a 
questionnaire required each responden 
marriage (not that of each spouse indi 
scale. The distribution of ratings w 


ottrell considered the most 
ll feeling of happiness. Their 
t to rate the happiness of the 
vidually) on a five-step rating 
hich they secured is given in 


Table 106. 
Taste 106. Distribution of Subjective Happiness Ratings* 
Very happy ssis ou ns 42.6% 
FPN sna eao 20.5 


Average . 14.4 
Unhappy..........,, 13.5 
Very unhappy........ 8.0 
No reply... 1.0 
TAAT ieis a 100.0% 
* From Burgess, E, W., and Cottrell, L. S. Jr, Predicting Success or Failure in Marriage. 
New York: Prentice-Hall Inc., 1939, 


How adequate can these ratin 
tive? Are they reliable? Are the 
Burgess and Cottrell compared 
distribution showing 


given by one member of a couple with th 
compared the rati i 


gs be considered? Are they objec- 


id? these questions 
the distribution in Table 106 with a 


> and comp 
` pendently by husband and wife, : 
The comparison of the distribution of the spouses’ happiness rat- 


— 3 
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ings with that of parental happiness ratings was apparently intended 
to “getat” the objectivity of the first distribution of ratings. Burgess 
and Cottrell found that these two series of ratings did not differ 
appreciably from each other and took this as evidence supporting 
the claim of objectivity in the spouses’ ratings. We cannot be en- 
thusiastic about this “proof,” however, since there is little reason to 
suppose that halo or error, if present, will not affect both sets of 
ratings. 

The remaining comparisons mentioned in the first of the two fore- 
going paragraphs are relevant to reliability. Burgess and Cottrell 
report than 251 husbands and 251 wives provided independent 
ratings of the happiness of their marriage and that these independent 
ratings were highly correlated (coefficient of contingency = .80). 
We cannot accept this evidence uncritically. All that Burgess and 
Cottrell can mean by independent ratings is that both husband and 
wife furnished a rating. But the replies were received by mail, and, 
in most cases, there was no contact with the respondent. Therefore 
there was no way for Burgess and Cottrell, or for us, to determine 
the extent of collaboration which may have taken place in making 
these ratings. 

In 272 cases an outsider rated the happiness of the marriage. These 
ratings correlated .91 on a tetrachoric basis, or .68 on a coefficient 
of contingency basis, with those furnished by a member of the 
couples themselves. For 38 couples two judges assigned ratings upon 
the basis of case histories. Their first ratings (averaged) correlated 
.86 (tetrachoric) with their second ratings, and .96 (tetrachoric) 
with those assigned by a member of the couples themselves. 

Objections can be raised against the use of happiness ratings as a 
criterion for marital success or failure. Those specifically listed by 
Burgess and Cottrell are that marital happiness cannot be defined 
in an objective manner, that the meaning of marital happiness varies 
another, that an individual’s own conception o 
aries from day to day, that a marriage may be 


happy for one spouse and unhappy for the other, that people may 
ir ratings, and that there is no completely 


not in giving the 

E Pt a of edie the reliability and validity of the 
ratings. In spite of these recognized objections, Burgess and Cottrell 
argue that the happiness ratings as they secured them can be made 
to serve a useful purpose. 


from one person to 
marital happiness v 
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Taste 107. The Relation of Adjustment Item Responses to Over-all Ratings of Marital 


Happiness* 
Contin- Tetra- Maxi- 
Item gency choric mum 
coefficient | coefficient | weight 
State approximate extent of agreement or disagreement 
on following items: 
Handling family Agaa iaso aanname 50 -69 10 
Matters of recreation. . 48 265 10 
Religious matters...... 28 38 5 
Demonstrations of affection 45 65 10 
BIGNESS AAS 47 .60 10 
Intimate relations. 50 -61 10 
Caring for the baby 41 40 
Table manners 22 Be 5 
Matters of conventionalit 43 at 10 
Philosophy of life........ 48 -62 10 
Ways of dealing with in-laws....... id 46 .66 10 
When disagreements arise, they usually result in; hus- 
band giving in . . . ; wife giving in . . . ; agreement 
by mutal giveiand take: «ois, ...nceesee ence neveness 45 -70 10 
Do husband and wife engage in outside interests together: 
all of them... ; some of them... ; very few of | 
EE I E R 3: ee tae apes sites. He crags .48 -76 10 
In leisure time husband prefers: to be on the go . . . | 
stay at home . . . ; wife prefers to be on the BO a a 
to stay athome... ... 44 -70 10 
Do you kiss your husband (wife 
sionally . . . ; almost never . . . 45 -69 10 
Do you confide in your husband (wife Imost never 
ew. 3 rarely... 5 in most things . ; in every- 
TINE 1. ce: 0: wien E ax nik | -47 sS 10 
Do you ever wish you had not married? Frequently . | 
occasionally . . . ; rarely . . . ; never. . 63 86 15 
If you had your life to live over, do you think you would: | 
marry the same person . . . ; marry a different person | 
+ 3 not marry atall... -58 .87 15 
| 
41 55 10 
35 53 7 
—.31 1 
30 1 
= Bi 1 
= 60 1 
AS 1 
—.47 1 
27 1 
* From Burgess, E. W., and Cottrell, L. 


S., Jr. Predicting Success or Failure 


New York: Prentice-Hall Inc., 1939. in Marriage. 


Adjustment: Prognostic Approaches 249 


Adjustment Ratings. But as the ratings stand in Table 106, they 
cannot be said to yield a very refined series of discriminations. 
Furthermore, a change of a rating from one step to another would 
constitute a major variation. To secure a scale capable of making 
finer discriminations and one on which any given fluctuation would 
not constitute such a violent change in meaning, Burgess and Cot- 
trell proceeded to compute the correlation between each of the 
adjustment items and the over-all happiness ratings. These correla- 
tions, together with the adjustment items, are given in Table 107. 

Item Weights. Next Burgess and Cottrell assigned scoring weights. 
These are also given in Table 107. The maximum weight for each 
item was made proportional to its correlation with the happiness 
ratings, and the other answers were assigned points in accord with 


the way in which happy and very unhappy ratings were distributed 


among these answers. A concrete example of this weighting process 


is given in Table 108. 
Tape 108. The Assignment of Scoring Weights* 


Percentage | Weight Rating 
-| = 
fle | 15 Never 
18.2 4 Rarely 
8.0 2 Occasionally 
z.s 0 | Frequently 


* From Burgess, E. W., and Cottrell, L. S., Jr. Predicting Success or Failure in Marriage. 


New York: Prentice-Hall, Inc., 1939. 
pon the basis of the correlation 


The weight of 15 was assigned u 
and the self-ratings on marital 


between the responses to this item 
H H ” 
happiness. A value of + was assigned to the response “rarely 


because the percentage for this response was roughly one-fourth 
that for the response “never.” A value of 2 was assigned to the 
response “occasionally” because the percentage for this response 
was roughly one-half that for the response rarely. According z 
this scheme it appears logical that the response frequently shou 


be assigned a weight of 0. . f 

Third, using the scoring values we have just described, Burgess 
and Cottrell scored all the questionnaires and used the total scores 
as their measure of marital adjustment. The distribution of these 
adjustment scores is given in Table 109. 
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Tare 109. Distribution of Marriage Adjustment Scores for 526 Marriages* 


Adjustment Nomber | Persie Cumulative 
scores percentage 
190-199 19 3.6 100.0 
180-189 51 9.7 96.5 
170-179 82 15.6 86.8 
160-169 74 14.1 71.2 
150-159 50 9.5 57.1 
140-149 32 6.1 47.6 
130-139 41 7.8 41.5 
120-129 33 6.3 33:7 
110-119 25 4.7 27.4 
100-109 20 3.8 22.7 
90- 99 23 4.4 18.9 
80- 89 19 3.6 14.5 
70- 79 16 3.0 10.9 
60- 69 21 4.0 79 
50- 59 12 2.3 3.9 
40- 49 5 1.0 1.6 
30- 39 2 0.4 0.6 
20- 29 1 0.2 0.2 

ei 526 100.0 


* From Burgess, E. W., and Cottrell, L. S., Jr. Predicting Success or Failure in Marriage, 
New York: Prentice-Hall, Inc., 1939. 


A number of correlations pertinent to the evaluation of these 
adjustment scores are given in Table 110. The first two of these 


correlations may, perhaps, be considered as measures of reliability, 
and the third may be considered a measure of validity. 


Taste 110. Adjustment Score Correlates* 


Variables Number r 


Correlation between scores for husband and wife 


i ores for husband and wife..........., 66 88 
Correlation between adjustment scores and happiness ratings .. 68 95 
Correlation between adjustment scores and divorce or separation... ,. 526 - "89 

* From Burgess, E, W., and Cottrell L. S., Jr. Predicting Success Failure i i 
New York: Prentice-Hall, Inc., 1939, : ‘ 5 ae 
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through questions covering the cultural background of both husband 
e wife, through questions on their psychogenetic characteristics: 

M . . . . * 2. 
through questions on their social characteristics, through questions 


on economic factors, and through questions on response attitudes 


and patterns. 
For each possible predictor Burgess and Cottrell prepared a scatter 


diagram to show its relation to the marital adjustment scores. An 
example is given in Table 111. This table shows the relationship 


Taske 111. The Relation between Marital Adjustment and Duration of Premarital 


Acquaintance” 
| 
| Marital adjustment 
| | Mean 
aa a h 
Period of acquaintance Books Tair Goad, adjustment 
score 
per cent | per cent | per cent 
s |— _ 
Under six months... -+s+ 47.0 | 30.6 | 22.4 120 
Six—twenty-three months. . | S77 24.6 37.7 132 
Two—four years..-.- +++ 27.8 28.4 43.8 141 
Five years and more.. 14.7 | _ 32.0 53.3 153 
Total... 28.5 28.3 43.2 


* From Burgess, E. W., and Cottrell, L. S., Jr. Predicting Success or Failure in Marriage. 


New York: Prentice-Hall, Inc., 1939. 


and duration of acquaintance prior to 


richotomized into good, fair, 
cquaintance is broken down 
w in the table or 
elationship which 


between marital adjustment 
marriage. The adjustment scores are t 
and poor scores, and the duration of a 


into four categories. We can read across any ro 
d a demonstration of the r 


down any column to fin 

the table is intended to portray. For example, we see in the first 

column that as length of premarital acquaintance increases, the 
s from 47 


adjusted couples steadily decrease: 
he third column we see the converse: that the 
“good” adjustment steadily increases 


from 22.4 to 53.3 per cent as length of premarital acquaintance 
increases. Reading across the table in the top row, we find that the 


percentages decrease as We go from couples with “poor” adjustment 
to couples with “good” adjustment. And, in the last row of the 


table, we find the converse: that the percentages increase as we go 
9 b e ” 
from couples with “poor” adjustment to couples with “good” 


adjustment. 


percentage of poorly 
to 14.7 per cent. In t 
percentage of couples with 
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Besides these scatter plots Burgess and Cottrell also computed 
mean marital adjustment scores for subjects giving each of the 
possible responses to the various predictor items. Those for the 
various categories of duration of premarital acquaintance are given 
in column 4 of Table 111. They steadily increase from a minimum 
value of 120 to a maximum value of 153 as length of premarital 
acquaintance increases. 

Processing their data in the manner we have just described, Bur- 
gess and Cottrell found 35 items with significant predictive value. 
Of these items, 21 were for husbands and 17 were for wives. Three 
of the items were useful for both husbands and wives. 

Scoring Weights. Burgess and Cottrell’s next problem was to 
assign scoring weights. For each predictor they assigned a maximum 
value to the response given by the highest proportion of couples 
with “good” adjustment scores and a value of 0 to the response 
given by the smallest proportion of couples with “good” adjustment 
scores. The magnitude of the maximum value was made to approxi- 
mate the difference between the highest and lowest proportions of 
couples with “good” adjustment scores, and intermediate values 
were assigned, say Burgess and Cottrell, “by inspection.” In short, 
this process was very much like the one illustrated in Table 108. 

When these weights had been determined, Burgess and Cottrell 
applied them to their predictor items and derived marital-happiness 
prediction scores for all of their subjects. These scores were found to 
have a reliability of .88, measured either by the intercorrelation 
between the scores for husband and wife or by a test-retest 
correlation. 

Significant Items. Several items which Burgess and Cottrell found 
to be predictive of marital happiness for men, together with the 
scoring weights for the alternative responses to them, are given in 
Table 112. The items for women are so similar in character that 
their reproduction would constitute a needless duplication. 

Validity. Burgess and Cottrell determined the validity of their 
marital-happiness prediction scores in two ways: 
them with the adjustment scores and by comparing the scores 
secured by divorced, separated, and nondivorced couples. 

Correlations with adjustment scores were found to be .51 and .48. 
The first of these coefficients is based upon cases included in the 
standardization group, so it cannot be taken seriously. The second 


by correlating 
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coefficient can be accepted, however. It is based upon 155 cases not 


included in the original group. 
Table 113 shows a comparison between the scores of divorced, 


separated, and nondivorced couples. It is clear that there are sub- 


Tase 112. Several of the Items in Burgess and Cottrell’s Marital Adjustment 
Prediction Scale for Men* 


Scoring Weight 
1. Place in family: 
Only chilli. oara vig enois im inna KESAR aeea oaia r 0 
Oa i ccs riani aaa aan a RE a Hooves OH AS REE E B 


Middle child. . aig AAE VESE DEE S 


Youngest child... -00e ai 15 
ING Re AE E REO sari tiem na rHIASR ON GY 0 

2. Most attached to which sibling: : 
Only child 0 
No special attachment but has sibling y 
Older brother. ....--+++se0850000* t 
Older sister. o.oo eseis is serrera ne eee a % 
Younger brother... -s.e sig : 
Younger sister....-- or 
No reply.....--0eeee res 

3. Area of residence at time of marriage: g 
Chicago rooming house area... cecce ee S 
Chicago area of first settlement. ~- - a 
Chicago area of second settlement. ...+-++++s5s7850077" 5 : 
Chicago hotel area. ss «e o ee siana venntst t SEUTA Bap k 
Chicago apartment and apartment hotel area... iD 
Chicago private homes of better class....-+ : 5 
Chicago suburbs...0+.2+sseeenreret sterner senses eee tp 2 
Other City iy a s.eeeeeeene tset p“ 
Small town not a Chigagoisubut Di sssasse o vesens os i ns 
DEAD, «yi ar orana pee iS, SSRN ORRIN TEMALAR EE ; 
No e ma eadi we erent eee ak aaa ETE NES 

4. Education at marriag š 
Grades only t 
Hiphschooliu ssaa mapeo assa i SRA TNE 0 
Professional school but not college. -- - ig 
Callpa m onneni wear WENA MA 7 
Graduate or profession 0 

Poti aah Success or Failure in Marriage. 


* From Burgess, E. W., ani 
New York: Prentice-Hall, Inc., 1939. 
differences 
stantial differences between the groups and that these 
are in the expected directions. 
It is evident that Burgess 


happiness prediction scores wit! 


their marital- 


and Cottrell derived th 
m intercorrela- 


hout reference to Ite 
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tions. Burgess and Cottrell were well aware that these intercorrela- 
tions are important and that it would have been desirable to take 
them into account. But to do this would have required much more 
` time and effort than Burgess and Cottrell felt they could devote 
to the problem. Therefore they attempted a compromise by classify- 
ing their predictor items into five categories: psychogenetic, cultural 


Taste 113. Percentage Distribution of Prediction Scores for Those Who Are Divorced, 
Are Separated, Have Contemplated Divorce or Separation, and Have Not 
Contemplated Divorce or Separation* 


] 
| Have contemplated | Have not contem- 
Prediction score | Number | Divorced Separated divorce or plated divorce or 
| separation separation 
700-779 11 0.0 0.0 9.1 90.9 
620-699 68 2:9 0.0 5.9 91,2 
540-619 139 22 4.3 6.5 86.3 
460-539 173 13:9 15.0 13:9 57:2 
380-459 100 25.0 17.0 16.0 42.0 
300-379 41 34.2 21.9 21.9 219 
220-299 8 50.0 37.5 12.5 0.0 
Oba si ty grease 540 73 a 64 342 


* From Burgess, E. W., and Cottrell, L. S., Jr. 


Predicting Success or Failure in Marriage. 
New York: Prentice-Hall, Inc., 1939. 


impress, social type, economic role, 
computing the intercorrelations amo 
rather than those among all individ 
among these five groups of items are given in Table 114, Burgess 
and Cottrell used these intercorrelations and the correlation of each 
group of items with the marital adj 


l a adjustment scores to determine a 
multiple correlation. This multiple correlation turned out to be .56 


and response patterns, and by 
ng these five categories of items 
ual items. The intercorrelations 


Taste 114. Lntercorrelations of Scores among Five Prediction Areas* 


Prediction areas Psycho- | Cultural | Social 


f S Economic 
genetic | impress | type role 

Cultural impress... -30 

Social type... -45 A7 

Economic role. 29 .32 -53 

Response patterns.......,., -30 Be v4 42 34 


* From Burgess, E. W., and Cottrel 


LLS, Jr. Predicting 
New York: Prentice-Hall, Inc., 1939, 


Success or Failure in Marriage. 
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ounces only a-slight increase over the value of .51 previously 
Contingency Factors. Burgess and Cottrell point out that there 
are a number of factors not included either in their adjustment scale 
Thee Ree elev s ae n keygens 
a y factors and take place or occur 
after marriage. Such items are number of years married, size of 
community, distance of residence from Chicago, characteristics of 
neighborhood, type of residence (number of rooms, and whether 
home is owned, is being rented, or is being purchased), amount of 
rent, changes in residence, relatives (living with them or not, and 
visits to or from), children (number of, and desire for), average 
length of employment in each position held since marriage, extent of 
unemployment, and financial status. 
Burgess and Cottrell found that all items except “residence after 
marriage with relatives, number of rooms in residence and monthly 
rent per room” were significant in relation to marital adjustment. 
Therefore it is important for us to realize that individuals having the 
same marital-happiness prediction scores may end up with different 
marital adjustment scores, and this by itself can be taken in no way 
as evidence for lack of validity in the prediction scores. One person 
may run into one set of contingency factors and a second person 
may run into a second set, and these two sets of contingency factors 
Operate to produce a difference in what had originally been predicted 
to be equal degrees of marital adjustment. Burgess and Cottrell 
content themselves with discussing this problem but do not make 
any attempt to include the effects of contingency factors in their 
predictions of adjustment from their marital happiness scale. 


TERMAN 


prognostic approach to the measurement 


he field of marital happiness. It consists 
an and his collaborators at Stanford 


University. We shall find that Terman’s study differs in several 
Important respects from Burgess and Cottrell’s study, and it will be 


Important for us to evaluate these differences from the standpoint 
Before we can make these evaluations, 


of personality measurement. 1 : 
however, we must review the steps involved in Terman s study. 


Our second example of a 
of adjustment also lies in t 
of the scale developed by Term 
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Terman, like Burgess and Cottrell, was interested in the measure- 
ment and prediction of marital happiness. “We have selected as the 
theme of our study,” says Terman, “that aspect of the successful 
marriage which may be designated as marital happiness, and we 
wish to ascertain, if possible, what psychological factors are demon- 
strably associated with this state.” 

This purpose necessitates, as it did in Burgess and Cottrell’s study, 
a measure of marital happiness. This measure of happiness must be 
used as the standard to which all premarital predictor items must be 
related. Terman borrowed most of his items from the marital adjust- 
ment scale constructed by Burgess and Cottrell. Thus Terman’s 
criteria of marital happiness are “subjective ratings of the happiness 
of the marriage, and factual information on husband-wife agreement 
or disagreement about various matters, on methods used in resolving 
disagreements, on specific things in the marriage that are unsatis- 
factory, on regret over the choice of a mate, and on consideration 
that may have been given to separation or divorce.” The way Ter- 
man requested these ratings (from his respondents) differed in only 
minor ways from those used by Burgess and Cottrell. 

Terman, in the opening chapter of his book Psychological Factors 
in Marital Happiness, lists several points of methodological tech- 
nique which require consideration. These are that the researcher be 
fully cognizant of the pitfalls of sampling, of the fact that he must 
design his questionnaire to elicit the kin 


r d of information that a 
respondent will be willin 


g to give, of the importance of getting 
complete answers from each subject, of the necessity of securing 


data in a way that makes them amenable to statistical trea 
and last, but certainly not least, o 


the interpretation of his findings. 

Terman’s questionnaire was divided into seven parts. Part I con- 
sisted of 71 items taken from the Bernreuter Personality Inventory. 
Part II consisted of 128 items taken from the Strong Vocational 
Interest Test. Part III consisted of 34 opinion items. Part IV con- 
ion index of marital happiness. 
hildhood background. Part VI 
sex adjustment. And Part VII 


tment, 
f the caution and care needed in 


consisted of 25 miscellaneous items. 


Subjects. Terman secured his data through personal contact. The 


edure was for Terman, or for one of his associates. to 
> 
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J y o 
Se ee 
he yi o a volunteers to stay after the 
talk to complete the questionnaires. 
Phe orem = a pen anonymity and guaran- 
j 4 5 xplained to the subjects that the 
schedules could be completed by drawing circles around preprinted 
answers or by inserting check marks in appropriate places—the 
point being that no telltale handwriting would be required. Second 
the schedules were not to be signed. Third, all schedules were to be 
mingled with hundreds of others so that the identity of each schedule 
would be lost in the general calculations. Subjects were asked to come 
to the front of the lecture-room (husband and wife were to come 
forward together) and to take any envelope they chose out of a large 
basket marked A. This envelope contained two smaller unsealed 
envelopes with one schedule for the husband and one for the wife. 
The husband took his envelope and schedule to one side of the room, 
and the wife took her envelope and schedule to the other side of the 
room. When the husband completed his schedule, he inserted it in 
the appropriate envelope and sealed it. The wife did the same thing 


with hers. Then both husband and wife came to the front of the 
s together in a larger envelope, 


ed it into a basket marked B. 
prevented collaboration, and 


room, put their two sealed envelope: 
sealed this larger envelope, and dropp 
This procedure assured anonymity, 
made it possible to keep the schedules of a husband-wife pair to- 
gether. Terman justifiably claims that no other investigator has 
availed himselfof a techniquesoairtight with respect to the prevention 
of collaboration and to the assurance of anonymity to the respondents. 

Terman secured completed schedules from 792 couples. These 
and upper-middle class of urban and 
f the husbands were 
The average school 


nds and 38 


couples came from the middle 
semiurban Californians. Over 50 per cent o 
employed in business or in the professions. 
grade completed was 14, and 48 per cent of the husbar 
per cent of the wives had graduated from college. The mean age © 


the husbands was 39 and that of the wives was 36. The average 
length of time married was slightly more than eleven years but one- 


third of the couples still had no children. 
Marital Happiness Scale. The items which Terman used as a 
basis for constructing 4 criterion index of marital happiness are 


presented in Table 115. We also give in this table the scoring weights 
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Tasre 115. Distributions of Answers to Questions in Terman's Marital Happiness 


Scale* 
Item Husbands | Wives | Weight 
Do you and your wife (husband) engage in outside interests | 

together? | 
R EE E RE E S A 13.7 19.8 | 7 
Most of them..... 54.6 55.9 5 
Some Isin Rre reaa 26.7 18.7 | 3 
Mery Pew tol Them caescyaceins avesieianciaverd Ai AEE ep 4.9 4:4 | 1 
None of them EF 0.1 12} 0 

Approximate extent of agreement or disagreement on: 

Handling family finances 

Blea yesacrre A N E AAEN A E, 29.2 34.6 8t 

Almost always agree... 0.6... a cece ee, 46.8 44.2 6 

Occasionally disagree... 19.6 14.0 4 

Frequently disagree... TS FeS 5.8 2 

Almost always disagree................... “as 0.5 Lull 1 

Always disagree........, 0.4 0.8 0 
Matters of recreation : 

Always agree 19.2 24.1 

Almost always agree. 57.8 53.1 

Occasionally disagree.............. 18.1 18.6 

Frequently disagree... ore a2 2.9 

Almost always IBA BTEE or aA 4H aie ce e. 0.6 1.0 

Almaya dina griteria ceoorwasieh v4 se... 0.1 0.3 
Religious matters 

ALWAYS ABTEE,,. Fo secs newts Siem npna va au eonmsios 46.6 49.6 

Almost always agree.................. 38.1 34.6 

Occasionallydisigtettesters<enwwascunir an 10.9 11.1 

Frequently disagree................. 2.7 2.8 

Almost always disagree............. 0.8 0.8 

Always disagree........... 9 Lä 
Friends 

Always agree 

Almost always agree... 

Occasionally disagree... . 

Frequently disagree...... 

Almost always disagree 

Always disagree........, 


ee 
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Taste 115. Distributions of Answers to Questions in Terman's Marital Happiness 
Scale.* (Continued) 


Item Husbands | Wives | Weight 
Caring for the children 
Always agrees oss. coe caceees ove nee cresaneunvene e 25.7 31.2 
Almost always agree... 46.9 41.6 
Occasionally disagree. . 21.8 21.4 
Frequently disagree. ... 4.4 4.4 
Almost always disagree 0.9 1.0 
Always disagree 0.3 0.3 
Table manners 
Always agree 36.4 41.6 
Almost always agree. 42.5 36.1 
Occasionally disagree.......- +++ +++ «| 17.4 R 
Frequently disagree... -seter ana 3 fe 
Almost always disagree. naag T ee 
Always disagree... .. . 
Matter: nventionality 
ANE : 23.9 29.8 
: | 47.9 | 42.8 
Almost always agree... .. -serret 
i E ioes maraa aa RORE aiga aan 23.3 21.3 
Occasionally disagree... -eere > 1o 
Frequently disagree... -steere entenet rentur r a 
Almost always disagree. ....-. +5255 000+ K ae 
Always disapree....sc.cccenencns AEn rasei snes : : 
(re TEET 
ays agree...... 18.7 ree 
Almost always agree... neiere om! 7a 
Occasionally disagree.....-. ee. ra $1 
Frequently disagree.....-..+++++s0serrrt aie ae =e 
Almost always disagree... -esere ait ae ae 
Always disagree asd : 

When disagreements arise, they usually result in sek EO a 
You giving in...........eeeee eee ee ee tie a, Ps 2 
Your wife (husband) giving in. . tepals a e se 3 
Mutual give and take s 

Do you ever regret your marriage? si 35 0 
Frequently pi 12.8 4 
Occasionallysa suena seran å 28.9 25.3 7 
FB wrenrearinica sole ae SR 1 Be ra 4 
ING er oaoaraa ei iae na paieae sons te 

If you had your life to live over do you think you would: er, 86.1 10 
Marry the same person...-+ +++ +> iol 10.4 0 
Marry a different person lial 72 3.5 0 

| Not marry at all ee 
| Have you ever seriously contemplated separation! 16.3 21.0 ot 
es. 79.0 8§ 
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Taste 115. Distributions of Answers to Questions in Terman’s Marital Happiness 
Scale.* (Continued) 


Item Husbands | Wives | Weight 
Have you ever seriously contemplated divorce? 
8. 
91 
Everything considered, how happy has your marriage been? 
Exerdordinarily happy nic meses orse eanna Ns a o os 34.6 15 
Decidedly more happy than average 36.8 35.9 12 
Somewhat more happy than average 16.3 14.7 9 
About Koa oaia 12.9 92 6 
Somewhat less happy than a 29) 3.0 3 
Decidedly less happy than average. ........-.0..00e0005 1.6 1.8 0 
Extreniely anhap as ru He oe as as seeders a 0.1 0.8 0 
If your marriage is now unhappy, how long has that been true? | 
Unanswered 92.1 | 90.6 11 
One year or more specified.. Pei Un Se Se anA 9 9.4 0 
Complaint score (computed by counting 1 for each annoyance 
circled 1 and 2 for each annoyance circled 2. Annoyances 
circled 0 not counted) || 
38.2 43.1 13 
22.4 20.3 11 
20.5 221 9 
13.0 8.7 6 
3.9 3.2 3 
30 or over. 2.0 2.6 0 


* From Terman, L. M. Psychological Factors in Marital Happiness, New York: McGraw- 
Hill Book Company, Inc., 1938. 

t These weights were assigned in accord with the aver: 
the following nine items. 

TA weight of 0 was assigned for an answer of “Yes” 

§ A weight of 8 was assigned for an answer of “No” to bath this and the following question. 

|| The instructions preceding the list of complaints called for circling of “0” if the thing 
mentioned was present in the marriage but had not interfered with happiness, “1” if the thing 


had made the marriage less happy than it should be, and “2” for things that had done most 
to make the marriage unhappy. . 


age amount of agreement on this and 


to this or the following question. 


and the distribution of answers among the alternate item-response 
categories. The items show variable intercorrelations with each 
other, ranging from a low of .22 to a high of .84. Their average inter- 
correlation is .57. These intercorrelations make it evident that some 


factor underlies all items, and this common factor Terman assumes 
is, of course, the variable marital happiness. 


Item Weights. To weight the items for scoring, two criteria were 
considered: “the average magnitude of the correlation of the item 
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with each of the other eight items, and the size of the husband-wife 
correlation for the item in question.” Terman says that “items which 
showed the highest intercorrelation received greatest weight on the 
supposition that they were most heavily saturated with the general 
happiness factor and hence more valid indicators of the trait to be 
measured. Account was taken of the husband-wife correlation on 
the ground that this was a rough indication of reliability.” 

Scoring values were assigned to the various response categories in 
such a way that the standard deviations of the distribution of 
answers would be proportional to the weight desired. When the 
blanks for the 792 couples were scored with the weights given in 
Table 115, the distributions presented in Table 116 were obtained. 


Taser 116. Distribution of Happiness Scores* 
D a 


Happiness | Husbands | Wives 
score 
85-87 72 101 
80-84 | 138 172 
75-79 161 143 
70-74 160 | Æ 
65-69 68 66 
60-64 50 40 
55-59 39 32 
41 33 
23 15 
14 17 
6 | 17 
13 13 
10 10 
10 5 
10 12 
7 7 
5-9 3 10 
0- + 1 3 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


decidedly nonnormal in appearance. There 


These distributions are 
piling-up of the scores at the upper 


is a great preponderance OF 
(happy) ends of the distribution. 

Nonnormality. ‘Terman discus 
nonnormality. First, he posits Inequ 


ses three possible causes of this 
ality of the scale units at dif- 
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ferent parts of the scale. To explain the skewness, this would require 
the assumption that the “units of happiness” at the upper end of 
the scale are much broader than those at the lower end of the scale. 
This would make for a bunching of the top ratings and for a spread- 
ing of the poorer ratings. This inequality of scale unit cannot, of 
course, be proved, but it is assumed upon the basis of a fact in 
psychophysics: that the units of a psychological measuring scale are 
proportional to the number of cases falling within it. When a trait 
is distributed in a normal manner, we require fewer cases at each 
extreme than we do in the middle part of the range in order to indi- 
cate a unit of equal length. Therefore when we see a bunching up, as 
we do in the marital happiness distributions, we assume that the 
intervals in part of the distribution are not sufficiently narrow in 
scope to allow for an adequate differentiation among the individuals 
in question. 

The second factor Terman mentions is that of selection, Normal 
distributions are to be expected only for sample populations drawn 
at random with respect to the variable being investigated. In the 
present case this has not been done. Terman’s subjects, for the most 
part, were secured via the medium of lecture and group meetings. 
Since both husband and wife were to participate 
that both be in attendance at the meeting. The only couples who 
attend such meetings are those sufficiently well adjusted to each 
other to attend group functions together. Thus the proportion of 


seriously unhappy marriages represented in Terman’s sample is 
undoubtedly far short of that in the general population. 


The third factor Terman mentions is generosity. This tendency is 
present in almost every rating scheme that has been devised. And 
this is particularly true when one has the task of rating himself. 
There is certainly no reason to suppose that this factor has not 
contributed its share in the skewness of the marital happiness 
distributions. ’ 

Item Correlations. The extent to whi 
scale makes a contribution to it can 
correlations presented in Table 117. T 
which each item correlates with total h 
correlations from containing a spuri 
total score due to the item in questi 
that the correlation between an ite: 


» it was necessary 


ch each item in the happiness 
be seen by reference to the 
his table shows the extent to 
appiness scores. To keep these 
ous element, the part of the 
on was subtracted from it, so 
m and the total score indicates 
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Taste 117. Correlations between Marital Happiness Scores and Responses to 
Individual Items* 


Item Husband | Wife | Combined 


= i} 

Degree of outside interests in common: 

Husband’s answer.....--. 0-22-02 00s es rrene te tent F oes i 

Wife’s answer e -56 -51 

Degree of agreement on: 
Family finances 


Husband’s answer. . Al bi E 
Wife’s answer....... 3 $ 
Recreation 46 43 

Husband's answer. $ 59 "9 


Wife's answer... 00se se cede san a nie naele © 
Religion 30 . 30 
Husband’s answer 
Wife's answer... oe recece mee 
Demonstrations of affection 58 ms 3 
Husband’s answer nats | 60 00 
Wife’s answer. ......-.-s08e serene 
Friends se as a 43 


Husband's answer. 50 49 
Wife's answer 

Care of children a5 sity Al 
Husband’s answer ee 53 .50 
Wife’s answer......--00eeeeeer ia E 

Table manners 38 zig -36 
Husband’s answer. . 42 38 


Wife’s answer 
Matters of convention 40 cw 38 


Husband's answer. seeen T 44 39 
Wifelsanswetcan ssiea ge rinia oo" 

Philosophy of life 45 nite 42 
Husband’s answer. . 57 352 
Wife’s answer.....---+7050°°7" 

Dealing with in-laws 40 aie -36 
Husband’s answer 40 -38 
Wife’s answer... -eet . 

Average degree of agreement on above 10 ie Po m Kf 33 
Husband’s answer.. eee i T bs -63 -60 
Wife’s answer.....---+202807 0" 

Degree of reciprocity in settling disputes 49 
Husband’s answer $3 
Wife's T A vs oe ssi 49 
Combined answer... -eee 80 


Lack of regret over marriage -76 er 
Husband's answer....-.-9 00°77 19. x 
Wife's answer... seen aiee ann 
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Taste 117. Correlations between Marital Happiness Scores and Responses to 
Individual Items.* (Continued) 


Item | Husband | Wife | Combined 

Satisfaction with choice of mate 

FIWBBANG's BB WEN baie tes: marcas us a O HORST he. a 82 

WW LfetS tan Gavel cs: ae ves Dra nis ase O IRR ee Y š .82 

Combined answer.......... 5 i rS HG e i aia asi -85 
No contemplation of separation or divorce | 

Husband’s answer.. ... -76 | 

L a E E EE sii P 82 

Ean, anger mr DERDES -HE RAE SB a ARO T P 74 
Degree of happiness of marriage (self-rating) | 

Husband’s answer... . -76 | ate 

Witold answers 2: a ut Bea Oy os 78 15 
No admission of unhappiness | 

PIUSbitid SansWeian <0 weiss as CARRS we EE Ss -80 | 

NNa oai as AEE ta RRA sn cox AN ara -83 

Combined Eas asan a aa E di anenemne siaga xa 85 
Total of husband’s complaints. ............000..000005 66 ý .62 
Total of wife’s complaints. ..... eT a Fa | 69 
Sum total of husband's and wife’s complaints 67 68 81 


* From Terman, L. M. Psychological Factors in Marital Happiness, New York: McGraw-Hill 
Book Company, Inc., 1938. 


its correlation with that part of the total score not due to the item in 
question. 

Terman found that the correlation between the marital happiness 
ratings and the marital happiness scores was .76 for husbands and 
-78 for wives. These are both lower than the value of .95 reported by 
Burgess and Cottrell. This discrepancy can be attributed to two 
facts: a tetrachoric coefficient, the type used by Burgess and Cottrell, 
usually provides a higher estimate than a Pearson correlation, and 
in the Burgess and Cottrell study the basic criterion consieted of 
nothing more than the happiness ratings. In Terman’s study the 
items were included or excluded upon the basis of their intercorrela- 
tion with a// other items in the scale, not alone upon the basis of their 
correlation with the marital happiness ratings. 

Terman gives three reasons for not wishing to accept self-happi- 
ness ratings as a sole or chief criterion. These are the known un- 
reliability of such ratings, the complex character of the variable 
being rated, and the extreme skewness in the distribution, Terman 
felt, as did Burgess and Cottrell, that a greater number of discrimin- 
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ations, that more normal distributions, and that greater score 
reliability would be secured by using data concerning areas of 
agreement and disagreement, satisfaction and dissatisfaction, and 
so forth, in addition to the happiness ratings. 

Terman derived and used throughout his study separate happiness 
scores for husbands and wives. This contrasts with the technique of 
Burgess and Cottrell, who concerned themselves with the happiness 
of the marriage and not with that of each individual partner. Ter- 
man found a correlation of .59 between the happiness scores of 
husbands and wives. Thus, either the scale cannot be considered a 
e happiness of a marriage (à la Burgess and 


very reliable index of th 
f the two spouses can vary in con- 


Cottrell) or else the happiness o 
siderable degree. 

The next step after the construct 
was the determination of its personality, 
correlates. The ways in which the values o 
determined differ from one area to another, 
discuss them separately. 


Personality Correlates. Terman 
variables. Of these variables, 71 were taken from the Bernreuter 


Personality Inventory, 128 were taken from the Strong Vocational 
Interest Test, and the remaining 34 items were designed to gain 
knowledge of opinions about the ideal marriage. As we already know, 
the Bernreuter items can be answered by mae 


“Yes,” “No,” or “P 
and the Strong items by L, I, or D. The opinion items, which are 
new to us, allowed for five alternat 


ive responses. 
The first step in the analysis of the persona 


lity correlates of marital 
happiness consisted in the pickir 


ng of two matched criterion groups of 
happy and unhappy couples. The first group selected were the 150 
couples with the lowest combined happiness scores. Then 300 happy 
couples were matched with these 150 unhappy couples. These 300 
couples had, of course, high happiness scores, but they were selected 
to have the same average ages the same average number of years 
married, the same average number of years of schooling, mi 
same average occupational status as the 150 unhappy couples. A 
husbands in each group averaged 39 years in age and 4 ae ; 
Both groups had been married, on the average (median), or e si 
years and had had a little over fourteen years of schooling. This 
matching made it certain that differences 


ion of the marital happiness index 
background, and sexual 
f these correlates were 
so we shall have to 


investigated 233 personality 


in the personality cor- 
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relates to be investigated could not be attributed to differences in 
age, length of time married, education, or occupation. 

Terman studied his personality variables in two ways. He com- 
pared the happy and unhappy groups with respect to the proportion 
of each group giving answers in different item-response categories, 
and he compared the tetrachoric correlations expressing husband- 
wife agreement in the happy group with those in the unhappy 
group. The data in Table 118 illustrate how these techniques of 


Taste 118. dnalysis of Personality Items* 


1 Critical 
Ite Happy | appy 'eigl 
tem | Bappy | Unhappy patio, Weight 
n | a ea 
Do you prefer a play to a dance? | 
Husbands: | | | 
4 63.3 2.4 | 1 
26.0 —1.6 
58.0 2.9 1 
30.0 =1.9 
| AB i erigi 2 


* From Terman, L. M. Psychological Factors in Marital 


Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938, pp.: ew York: McGra 


comparison were applied. We find, in this table, that the proportion 
of couples preferring a play to a dance is greater for happy couples 
than it is for unhappy couples, and that husband-wife agreement is 
greater for happy couples than it is for unhappy couples. 

Terman retained for his marital happiness prediction scale all 


ical ratios of 1.5 or more. This critical 


weights. One was based upo 
between proportions and the 
agreement correlations. In e 
number (0, 1, and 2), 
from 0 to 4. 

Terman generally assi 
was below 1.5, a wei 
weight of 2 when th 


of these weights wa 


gned a weight of 0 when the critical ratio 
ght of 1 when it was between 1.5 and 2.9, and a 
e critical ratio was 3.0 or higher. The dss enmen. 
s not carried out in a strictly mechanical manner, 
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however, as Terman tempered many of them with his own common- 


sense judgment. 
Terman retained 132 of the 233 personality items for his prediction 


scale. There were 54 Bernreuter items, 54 Strong interest items, and 
24 opinion items. From these items Terman prepared the following 
characterizations of the temperaments of happy and unhappy 


husbands and of happy and unhappy wives. 


Happily Married Men. Happily married men show evidence of an even and 
stable emotional tone. Their most characteristic reaction to others is that of cooper- 
ation. This is reflected in their attitudes toward business superiors, with whom they 
work well; in their attitude toward women, which reflects equalitarian ideals; and 


in their benevolent attitudes toward inferiors and underprivileged. In a gathering 


of people they tend to be unself-conscious and somewhat extroverted. As compared 


with unhappy husbands, they show superior initiative, a greater tendency to take 
responsibility, and greater willingness to give close attention to detail in their daily 
work. They like methodological procedures and methodical people. In money 
matters they are saving and cautious. Conservative attitudes are strongly character- 
istic of them. They usually have a favorable attitude toward religion and strongly 
uphold the sex mores and other social conventions. : 
Unhappily Married Men. Unhappy husbands, on the other hand, are inclined to 
be moody and somewhat neurotic. They are prone to feelings of social inferiority, 
dislike being conspicuous in public, and are highly reactive to social opinion. This 
sense of social insecurity is often compensated by domineering attitudes in relation- 
ships where they feel superior. They take pleasure in commanding roles over busi- 
ness dependents and women, but they withdraw from a situation which would 
require them to play an inferior role or to compete with superiors. They often 
compensate this withdrawal by daydreams and power fantasies. More often than 
happy husbands they are sporadic and irregular in their habits of work, dislike 
detail and the methodical attitude, dislike saving money, and like to wager. They 
more often express irreligious attitudes and are more inclined to radicalism in sex 


morals and politics. ; i an 
Happily Married Women. Happily married women, as a group, are characterize 

by kindly attitudes toward others and by the expectation of kindly attitudes E 
return, They do not easily take offense and are not unduly ee t 5 
impressions they make upon others. They do not look upon social te eee a 
rivalry situations. They are cooperative, do not object to subordinate D a a 
are not annoyed by advice from others. Missionary and P EEE ie u ate 
frequently evidenced in their ar Thre a e L e pel 
tional or pleasurable opportunities to others an like yde PUE E 
or un: ivi . They are methodical and painstaking in their work, a 

in E catatie Aa to money. In religion, morals, and patia hey Po 
to be conservative and conventional. Their expressed attitudes imply a q A 


assurance and a decidedly optimistic outlook upon life. 


Unhappily Married Women. Unhappily married women, on the other hand, are 
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characterized by emotional tenseness and by ups and downs of moods. They give 
evidence of deep-seated inferiority feelings to which they react by aggressive atti- 
tudes rather than by timidity. They are inclined to be irritable and dictatorial. 
Compensatory mechanisms resulting in restive striving are common. These are seen 
in the tendency of the unhappy wives to be active “joiners,” aggressive in business, 
and overanxious in social life. They strive for wide circles of acquaintances but are 
more concerned with being important than with being liked. They are ego-cen- 
tric and little interested in benevolent and welfare activities, except in so far 
as these offer opportunities for personal recognition. They also like activities 
that are fraught with opportunities for romance. They are more inclined to 
be conciliatory in their attitudes toward men than toward women and show 
little of the sex antagonism that unhappily married men exhibit. They are impatient 
and fitful workers, dislike cautious or methodical people, and dislike types of work 
that require methodical and painstaking effort. In politics, religion, and social 
ethics they are more often radical than happily married women. 


Background Correlates. The background items which Terman 
investigated as possible predictors of marit 


al happiness are given 
in Table 119. 


Taste 119. Background Items Investigated by Terman 


- Husband's occupation 

. Income 

Presence or absence of children 
. Present age 

+ Length of marriage 

. Age at marriage 

+ Age differences 

. Number of years of schooling 

. Differences in years of schooling 
. Relative mental ability 

- Acquaintance before marriage 

- Length of engagement 

+ Marital happiness of parents 

- Sibling relationships 

+ Conflict with and attachment to parents 
. Physical appearance of parents 
+ Childhood happiness 

. Home discipline and punishment 
. Religious training 

. Sex education 

- Childhood curiosity about sex 

- Premarital attitudes toward sex 
- Sexual shock 

Age of first menstruation 

- Adolescent petting 


. Association with opposite sex during adolescence 
27. Desire to be of opposite sex 


RBS HSwangnukruny 


pa 
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For most of these items the relation to marital happiness was 
determined by the critical ratio technique, but, in addition, for many 
of the items Terman computed a tetrachoric or Pearsonian coeffi- 
cient of correlation. Most of the items have predictive value, if they 
have predictive value at all, at a very low level. And therefore most 
of the items listed above need not be discussed. We shall pick out 
some of the more important items, however, and use them to illus- 
trate the types of analyses which were involved. 

Length of Marriage. Table 120 shows the mean happiness scores 
of husbands and wives according to number of years married. It 
starts at values of 73 for husbands and 74 for wives and drops to 
values of 65 for husbands and 67 for wives for six to eight years and 
then gradually increases for twenty-five years or more. These trends 
appear to be reasonable and to be in accord with our common-sense 


judgment that the honeymoon years should be-happier than those 
after this initial decline there should be 


several years later but that 
ature through 


greater understanding and happiness which should m 
the years. There is, then, a relationship between happiness and 


Tape 120. Mean Happiness Scores According to Length of Marriage* 


Years Number | Husband | Wife 
110 73.0 74.2 

103 68.5 69.2 

142 65.1 66.9 

92 68.4 69.0 

116 67.7 68.5 

73 65.9 65.5 

64 67.6 67.4 

32 71.3 69.9 

26 68.9 70.5 

27 or over... +--+ 34 69.4 70.3 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 


Hill Book Company, Inc., 1938. 
hich the correlation coefficients of —.03 and 


ause the relationship is curvilinear, and 
n coefficients of correlation are not 
an by a curvilinear relationship, 


length of marriage W 
.05 do not reveal. This is bec 
for such relationships Pearsonia 


appropriate. To note what we me 
let us first understand that a linear relationship demands a steady 


and consistent change in one direction on one variable for corre- 
t=] . . 
sponding changes on the other. In the present case a linear relation- 
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ship would mean a consistent and Steady increase or decrease in 
happiness scores for each additional year of marriage. Instead of 
this we find an initial decline followed by a more gradual increase in 
mean scores for each additional year of marriage. In other words the 
relationship is consistent, but the direction changes. This is what 
we mean by a nonlinear or a curvilinear relationship. 

Schooling. Terman finds that amount of schooling is related to 
marital happiness scores as follows: 


Husband’s schooling with his own happiness........,, 06 
Husband’s schooling with wife’s happiness... 17 
Wife's schooling with her own happiness. .. . 07 
Wife’s schooling with husband's happiness... 05 


When these correlations are divided by their respective standard 
errors, it is found that the correlation for husband’s schooling with 


wife’s happiness is the only significant one. ‘This suggested to Ter- 
man the comparison presented in Table 121. 


TABLE 121. Mean Happiness Scores According to Relative Amount of Schooling 


Schooling of husband Husband | Wife 
Five or more years O a avaa 68.3 72.0 
Five or more years less... 67.8 62.8 
Critical ratio of difference.........,.. 0.2 3.0 


* From Terman, L., M. Psycholo, 


gical Factors in Marital Happiness, Ni 
Hill Book Company, Inc., 1938, 


New York: McGraw- 
Thus the schooling of a husband is, 


to his own happiness but Significantly 
Wives whose husbands have had five o 


again, found to be unrelated 
related to that of his wife. 


Our point in Mentioning this item is to 


t in me 1 show that even though a 
difference is significant it m i 


ust still be Interpreted. And in this 
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Relative Mental Ability. Each spouse was asked to indicate whether 
the other spouse was equal to the rater in mental ability or possessed 
more or less mental ability than the rater. Obviously these ratings 
were subject to the usual types of error, but certain consistencies, 
nevertheless, emerge. The optimum state of marital happiness, that 
is, both partners being equally happy, occurs when they are equal 
to each other in mental ability. When there is a difference in mental 
ability, the partner of superior ability tends to be less happy than 
the partner of lesser mental ability. 

Marital Happiness of Parents. The marital happiness of parents 
correlates .25 and .21 with the happiness of the present marriage. 
These correlations are clearly significant, although moderate, and 
are consistent with the mean scores presented in Table 122. We 


Tape 122. Mean Happiness Scores According to Rated Happiness of Parents’ 
Marriage” 


Rated happiness of parent Number | Husband | Number | Wife 


Extraordinarily Happy ioen ra aenaran 83 m 2 7 i 
Decidedly more happy than average.......- g| ae pe = 71 8 
Somewhat more happy than average bs ae 165 69.9 
bout average icc. cers annsan s z ‘ f 
Somewhat less happy than average. es ae r as 
= 95.9 
Decidedly less happy than average 0 65.6 67 61.0 


Xtremely UBhAPPY oranes v4 we meaaiersio nse senema 


“From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 
present these data to show the remarkable consistency in trend to 
Which a fairly moderate degree of correlation can give = iis 
Oedipus Complex. We should like to show now a type o a lies 
taking into account the interrelationships of two ee ea : 
other and their individual and joint relationships to marital happi- 


is i j treatment by partial or 
ness. This is the type of data subject to tre c 
z alysis of variance. But we have in 


multipl relation or by the an ; 
peer 1 and one which can be used by a 


mind here a simpler type of analysis be | ; 
person not versed in some of the more advanced statistical tech: 


niques. The data we shall discuss are the rated attractiveness of the 
' the rated resemblance between the opposite sex 


and the marital happiness of the rater. The 
d in Table 123. 


Opposite sex parent, 
parent and the spouse, 
data we shall discuss are presente 
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The arrangement of the table shows that the first thing necessary 
is the preparation of a scatter plot designed to show the relation 
between the rated attractiveness of the opposite sex parent and the 
rated resemblance between the Opposite sex parent and spouse. 
Then, having prepared such a table and knowing the number of 
cases in each cell, and their identity, we must compute the mean 
happiness scores for the cases represented. This done, we are in a 


Taste 123. Mean Happiness Scores According to Rated Attractiveness of Opposite 
Sex Parent and Rated Resemblance between § pouse and Opposite Sex Parent* 


Resemblance between wife and mother 


Very Soma || None Opposite 
close types 
Husband's rating of mother: 
Exceptionally attractive, 68 63 71 69 
Above average 63 70 70 66 
Just average. , 68 68 70 68 
Below average 8 a 65 60 


Resemblance between husband and 


father 
Wife's rating of father: 

Exceptionally attractive... _. ul M 72 75 61 

Above average... a| 74 72 69 66 

Just average. „| 54 74 69 69 

Below average.. oe 39 67 72 
* From Terman, L. M. Psychological Factors in Marital Happine. New York: McGraw- 
Hill Book Company, Inc., 1938. ane NES Miata 


Opposite sex parent to spouse held 
constant; rated resemblance of i 


marital happiness, with influence ess of opposite 


ated attractiveness 
; A OPposite sex parent 

to spouse and marital happiness. p 
First, we can study the mean ha 


} e “PPINEss scores in each column and 
see how they vary with differing 


degrees of attractiveness of the 
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opposite sex parent. In the first column (that for very close resem- 
blance between opposite sex parent and spouse) we see that mean 
happiness scores decline as attractiveness declines. This same trend 
holds for the second column and also for the third. We discover in 
the column for opposite types, however, that the trend is reversed. 
Next, we can follow the mean happiness scores across by rows and 
find that for the first three rows the greater the resemblance between 
Opposite sex parent and spouse, the greater the marital happiness. 
But for the last row (when the opposite sex parent is rated below 
average in attractiveness) the trend is reversed. To consider the 
Joint relation between rated attractiveness of opposite sex parent 
and rated resemblance of opposite sex parent to spouse, let us start 
with the highest mean we can find. This is 78 in the upper left-hand 
cell. In other words, the happiest wives are those whose fathers were 
exceptionally attractive and whose husbands resemble these fathers 
very closely. As we go down the column from the first cell or as we 
go across the row, we find that the happiness scores decrease. In 
other words, from the optimum condition a lessening in either the 
attractiveness of the father or in the resemblance between husband 
and father, a lessening also in the happiness of the wives concerned. 
This finding is consistent with that which we reach by starting with 
the mean value of 72 in the lower right-hand cell. This represents 
the mean happiness of wives whose fathers are below average in 
attractiveness but whose husbands are opposite in type. As we 
Proceed up the column, we find a lowering of happiness score, or as 
We proceed left along the bottom row, we find a lowering of the 
happiness scores. In other words, if the father is below average in 
tated attractiveness, it is well for the wife to take to herself a husband 
of opposite type. We might summarize this discussion by saying that 
we have been attempting to prove in a hard and laborious and 


devious statistical way that which any young maiden could have 


al attractiveness is important in the choice of a 


icin ie the happier she 


husband, and in general, the more attractive /e 1s, 
will be. 

Sex Adjustment. 
adjustment items and ex > 
ness. First we shall list the topics w 
are given in Table 124. We shall not n 
ings for each one of these topics, but w 


We can now turn to a study of some of the sex- 


amine their relationship to marital happi- 
hich Terman investigated. They 
eed to discuss Terman’s find- 
e shall, as we did in the case 
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Tase 124. Sex-adjustment Items Investigated by Terman 
. Frequency of intercourse 

. Preferred frequency of intercourse 
. Relative passionateness 

Refusal of intercourse 

. Orgasm adequacy 

. Duration of intercourse 

- Desire for extramarital intercourse 
. Homosexual attraction 

. Wife’s response to first intercourse 
. Contraceptive practices 

. Wife’s rhythm of sexual desire 

. Sexual complaints 


Vong uN 


BES 


of the background items, select a few to show the methods of an 
ses which Terman employed. 

Frequency of Sexual Intercourse. We shall first discuss Terman’s 
findings on the significance of the frequency of sexual intercourse in 
relation to marital happiness. We do this to show how the influence 
of a third variable, in this case, age, must be eliminated before the 
true relation between frequency of intercourse and marital happiness 
can be adequately ascertained. 

Let us start naively, however, and examine the over-all data. 
Table 125 shows mean happiness scores for husbands and wives 


aly- 


Taste 125. Mean Happiness Scores in Relation to Monthly Intercourse Frequency* 


Monthly frequency | N 


Over 10 


umber | Men | Women 


80 69.8 73.4 


TAO: 159 70.2 | 70.3 
3-6 374 68.5 | 69.1 
1-2 153 65.7 | 67.4 

Da vi 18 60.0 | 51.5 


* From Terman, L. M. Psychological Factors in Marital Happiness. Ni 
Hill Book Company, Inc., 1938. 


ew York: McGraw- 
reporting different intercourse frequencies. There certainly appears 


to be a relation to marital happiness, rather more striking for wives 
than for husbands, but the Pearsonian coefficients are only .09 and 
.12 for husbands and wives respectively, 


But now we ask whether we can accept these data without con- - 
sidering age, for it does not seem unreasona 


ring age ble for us to expect a 
decline in intercourse frequency with adva 


ncing age. We should 
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like to know how this decline might affect our conclusion with regard 
to the relation of intercourse frequency to marital happiness. 

We give in Table 126 the median intercourse frequency as it 
varies with the age of Terman’s subjects. There is no doubt about a 
striking relationship. Intercourse frequency definitely declines with 
advancing age. The Pearsonian coefficients expressing this relation- 


ship are —.30 for husbands and —.33 for wives. 


Tape 126. Monthly Intercourse Frequency in Relation to Age* 


Men Women 


Age Number | Median Age Number | Median 


60- 15 14 55= 20 1.2 
50-59 80 2:9 45-49 88 2.8 
40-49 219 | 4.1 35-44 281 4.1 
30-39 340 5.0 25-34 333 5.5 
20-29 127 6.3 24 60 7:2 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 


Hill Book Company, Inc., 1938. 


Now that we have verified the hypothesis that age affects inter- 
eliminated as an unwanted factor 


course frequency, how is age to be nw: 
in the data we presented in Table 125? Terman eliminated it by 
preparing a second table in which age was held constant. Before he 
did this, however, he divided his total subject population into the 

39; 40 to 49; and 50 and 


following four age groups: below 30; 30 to ; 
over. Then he computed the correlation between intercourse fre- 


quency and marital happiness for each of these groups. These 
correlations were 03, .18, .08, and 03. Thus there is a change in the 
Importance of intercourse frequency with different age groups- The 
most marked relation occurs in the age Tang” 30 to 39, so we present 
Terman’s additional data for this group in Table 127. a- : 
Here we see the same trend we saw in Table 125, but it 1s brought 
out much more clearly. Thus the elimination of age clarifies the 


interpretation of the original set of data. . 
Table 128 shows mean happiness 


Preferred Intercourse Frequency. ] appi 
Scores in relation to preferred frequency of bee Hee 
i 4 i ip. This result is straight- 
here, no question of a marked relationship. 11 gl 
forward and requires no additional comment. 
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Terman points out that more important, probably, than inter- 
course frequency per se or preferred intercourse frequency per se is 
the relationship between them. That is, in spite of the relationships 


we have demonstrated, it would be possible for a person to be un- 
happy if his preferred and actual intercourse frequency were not in 


Taste 127. Monthly Intercourse Frequency in Relation to Marital Happiness for Ages 


30 to 39* 

Frequency Number | Husbands | Wives 
Over 10. 32 69.2 74.0 
7-10.. 81 70.9 72.2 
3- 6 153 66.0 68.6 
ES Morey 56 61.4 65.7 

j ERE 3 44.8 42.5 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


near agreement with each other. To get at this relationship, 
computed a hunger-satiety index by dividing reported frequency of 
intercourse by preferred frequency of intercourse. When this ratio is 
1.00, actual frequency and preferred frequency are in agreement. 
When it is more than 1.00, actual frequency exceeds preferred 
frequency and a state of satiety is approached. When the ratio is 


Terman 


Taste 128. Mean Happiness Scores in Relation to Preferred Monthly 


Intercourse* 


Frequency of 


Frequency Number | Husbands | Number Wives 
119 64.1 73 76.5 
217 68.0 151 69.4 
295 69.7 288 68.8 
86 71.6 134 67. 
na $7.3 22 57.8 


* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


less than 1.00, preferred frequency exceeds actual 
state of sex hunger is approached. A summary of the ratios for all of 
Terman’s subjects is given in Table 129, and Table 130 shows the 
relation of these ratios to marital happiness. This last table looks 
like an old friend, so our interpretation of the data it contains should 


frequency and a 
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not be difficult. We see that happiness scores increase as we proceed 
either from marked hunger or from marked satiety toward a position 
of optimum ratios. This is true for husbands and wives separately 
as well as jointly considered. 


Tague 129. Percentage of Subjects with Different Sex Hunger-Satiety Ratios* 


Hunger-satiety ratio Husbands | Wives 
A. Under .59 (marked hunger)....--.---- 24.3 13.6 
B. .59-.90 (moderate). ..... -sesse rres eee 19.4 8.8 
C. .91-1.10 (optimum)... .....--. 556 52.9 54.1 
D. 1.11-1.70 (moderate). pe bel 8.1 
E. 1.71 up (marked satiety).....---- 2.3 15.4 


mi From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


Tase 130, Mean Happiness Scores in Relation to Sex Hunger-Satiety* 


Hunger-satiety ratio Mase Moderate | Optimum | Moderate pes Total 
7 hunger satiety 
Marked hunger...........-+ 56 64 65 71 57 6l 
Modefafeis s muirear 57 72 74 64 64 68 
Opiini soarana 6t 71 75 73 69 73 
Moderate and marked satiety 54 65 | 75 78 63 66 

58 | 7 64 | 69 


Total 38 69 | 


x From Terman, L, M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938. 


Relative Passionateness. One of the questions which Terman asked 
his subjects was worded as follows: 


Do you think your wife (husband) is more or /ess passionate than you are? (check) 
somewhat more. , same. somewhat less. , much 


3 


uch more. 
less, 


> 


i The “ replies to this item were treated,” Terman says, “by averag- 
Ing the ratings of husband and wife and coding the result so as to 
Yield nine degrees of difference in the composite ratings.” Table 131 
shows the results and the relationship of relative passionateness to 
Marital happiness. As we should have been led to expect from the 
Preceding section, optimum happiness results when the husband and 
wife are equally passionate. When a discrepancy in passion occurs, 
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the fess passionate partner appears to be the less happy member of 
ia in columns 4 and 5 of Table 131 offer a type of validation 
ot the passionateness ratings. We see a definite equality in preferred 
monthly intercourse frequency when husbands and Wives consider 
themselves equally passionate, and we see discrepancies in the 
expected directions as we depart from this optimum state of equality. 


Taste 131. Mean Happiness Scores in Relation to Relative Passionateness* 


M i Number of copulations 
ean happiness 
F preferred per month 
Relative passion Number SS A 
Husband | Wife Husband Wife 
m — a 
1. Husband more 112 60 61 9.0 4.6 
2. Husband more 106 68 67 8.5 5.8 
3. Husband more 130 73 75 8.1 6.4 
4. Husband more. -| 110 73 73 8.2 8.0 
S Equality... kaa 121 74 75 8.6 8.7 
6. Husband less. 67 68 70 8.3 Bo 
7, 8, 9. Husband less 51 64 65 6.2 7.6 
* From Terman, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
Hill Book Company, Inc., 1938, 


TABLE 132. Sex-adjustment Items Related to Marital Happiness* 
1. Sex hunger and satiety (ratio of reported to preferred frequency of intercourse) 
2. Duration of intercourse (wife only) 
3. Rated relative Passionateness of spouses 
4. Wife’s orgasm adequacy 
5. Husband’s asserted ability to prolong intercourse 
6. Release and satisfaction from intercourse 
7. Forwardness of wife 
8. Overmodesty or Prudishness of wife 
9. Wife’s demand for foreplay 
10. Desire for extramarital intercourse 
11. Refusal of intercourse 
12. Attitude on being refused intercourse 
13. Number of sexual complaints 
14. Wife’s fear of Pregnancy 
15. Wife’s pain at first intercourse 


16. Wife’s enjoyment of first intercourse 


17. Time before wife experienced first orgasm 


18. Premarital intercourse à 
* From Terman, L. M. Psychological Factors in Marital Happiness. New Yı 
Hill Book Company, 


Inc., 1938, ork: McGraw- 
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f It will not be necessary for us to continue further this item-by- 
item review of Terman’s sex-adjustment items. We have illustrated 
all the different techniques involved and can list the sex-adjustment 
items showing a significant relation to marital happiness. They are 
given in Table 132. 

Relative Predictions. One of Terman’s main purposes was to 
determine the relative importance of personality, background, and 
sex-adjustment factors in predicting or influencing marital happi- 
ness. We have reviewed his studies separately in each of these 
areas, but now we wish to bring the various results together. This 
can best be done by means of the data presented in Table 133. This 
table shows the correlations between the various factors we have 
discussed and marital happiness. 

TABLE 133. Correlations between the Scores on the Various Prediction Scales and 
Marital Happiness* 


Item | Husbands | Wives 
1. | Bernteuten o er ope cc.5 -38 42 
2s ROD Eresin cacao tween eve art -36 35 
3. Opinion items.. 22 22 
4. Personality total. 47 46 
5. Background items....... 35 :29 
6. Personality and background........... 54 47 
7, Sex adjustment items.............0005 49 49 
8. Personality, background and sex........ 49 vat 


a i german, L. M. Psychological Factors in Marital Happiness. New York: McGraw- 
ompany, Inc., 1938, 

__ Unfortunately, these correlations have an element of spuriousness 

in them because they are based upon the same group of cases that 

Was used in deriving the various scoring weights. Nevertheless, we 

are probably safe in concluding that there is a significant relation- 

ship between the total of all factors and marital happiness. 

The sex-adjustment items, according to the data in Table 133, 
show the highest correlation with marital happiness, the personality 
factors run a close second, and the background items are third. 
Background and personality factors are just about equal to the 
Sex-adjustment factors in relation to marital happiness. This is 
‘portant, for the only information available prior to the marriage, 
and the information upon the basis of which our predictions have to 
be made, is that concerning the personality and background items, 
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SAMPLING INTERDEPENDENCE 


In the two studies we have just reviewed, it was necessary for 
Burgess and Cottrell, and for Terman, to sort their subjects accord- 
ing to their answers to one question at a time and to determine 
separately for each of these sortings what significance it had in 
relation to marital happiness. This creates a situation in which the 
successive samples (obtained from the successive sortings) cannot 
be considered independent of each other with regard to the variables 
under study. A corrective course of action is obvious, but this course 
of action is usually so prohibitively expensive that it cannot be 
followed. Let us realize, therefore, the limitations with which the 
course of action actually followed leaves us. 

We bring out one of the most important of these limitations in 
Tables 134 and 135. Both of these tables show distributions of 
marital happiness or marital adjustment scores in relation to the 
number of cases upon which they are based. Table 134 presents the 
results for Burgess and Cottrell’s study and Table 135 presents the 
results for Terman’s. Two facts stand out clearly from these tables. 
Mean marital adjustment scores or marital happiness scores are 
related to the number of cases upon which they are based, and the 
standard deviations of the distributions of marital adjustment or 
marital happiness scores are also related to the number of cases upon 


€ number of cases giving 
chance there is of finding 
ficant relation to marital 


ed by our finite sample. 


° alls into question the value of 
any study which does not include a means of processing data in 
t 


such a way as to make allowance for it. 


THE APTITUDE INDEX 


Our third example of a prognostic a 
adjustment is contained in the Apti 
signed to predict success in life insur: 
adjusted in this business, he will be 


Pproach to the measurement of 
tude Index. This is a test de- 


ance selling. If a person is well 
come successful in it. But if he 


lI 


eal 
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finds himself maladjusted, he will be unsuccessful and will soon 
leave it. We could say this about many other occupations, but since 
it is relatively so easy to get into the life insurance business—and so 
easy to leave it—it has become a glaring example whenever we think 
of poor occupational adjustment. 

The Aptitude Index was designed to cut down on this degree of 
maladjustment. It was designed to be used by life insurance com- 
panies to weed out ahead of time a large proportion of those ap- 
plicants who cannot adjust themselves satisfactorily to the life 
insurance business. It can also be used by applicants themselves as 
an aid to them in reaching their own decisions as to whether the 
selling end of life insurance is a suitable line of endeavor for them. 

The Aptitude Index is a unique test. It is the product of the Life 

Insurance Agency Management Association (formerly the Life 
Insurance Sales Bureau) and is published by this agency for the 
use of its member companies. Since the Life Insurance Agency 
Management Association has more than 200 members, this means 
that the Aptitude Index is available for use by a large segment of 
the life insurance business. 
_ There are two parts to the Aptitude Index. Part I consists of 10 
items similar to those on many a business application blank. It calls 
for an applicant’s age, number of dependents, occupation, number of 
organizations in which membership is held, net worth, and so forth. 
Part II consists of 81 personality items of the Bernreuter type and 
15 questions similar to those in Part II of the Allport-Vernon Study 
of Values. 

Development of Test. Those responsible for the development 
of the Aptitude Index (chiefly Albert K. Kurtz and Arthur W. 
Kornhauser) have not published fully on the methodology used in 
its development. However, we can report on the general ideas which 
were involved. Kurtz reports that the study leading to Part I was 


+». based, upon the records of 10,111 men without previous life insurance selling 
experience, who were contracted as full-time agents during 1933, 1934, and 1935. 


These men were contracted by eleven different companies operating throughout the 


United States. 
Data were gathered and analyzed on 24 personal history items and a scoring 
System was devised so as to give good prediction in terms of the following measures: 


1. Whether or not the agent-remained under contract for 12 months 


2. Whether or not the agent remained under contract for 24 months 
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3. Paid-for production during the first 12 months—for agents who remained in 
the business that long . . 

4. Paid-for production during the first 24 months—for agents who remained in 
the business that long 


After determining the relative importance of each of the 24 items, 10 were selected 
as giving the best predictions of first and second year production, and also giving 
good predictions of whether or not an agent would remain in the business. 


The scoring weights assigned to the various items, and their 
alternate answers, vary from 0 to 13. Those assigned for net worth, 
which will suffice for an example, are given in Table 136. 


Taste 136. Scoring Weights for Net Worth* 


$15,000 or more, . 10 
$10,000-$14,999, 8 
6,000- 9,999. 6 
1,000- 5,999. 4 
0 ere 2 


* Adapted from Kurtz, A. K. How Well Does the Aptitude Index Work? Hartford, Conn.: 
Life Insurance Sales Research Bureau, 1941. 


The total score for an applicant is obtained by 
for his answers to each of the 10 items. When thi 
it is compared with the norms fo 
converted into a letter ratin 
C, D, or E, indicates a given prob 


able to achieve success as a life insurance salesman. In one study of 
408 agents not included in the original standardization group, it was 


A produced 300 per cent more 
id applicants who rated E. In a 
found that applicants who rated 
e business than applicants who rated E. 

Part II of the Aptitude Index was designed to measure 
characteristics which . . experimentation has demonstr 
determining the success or failure of the new life insurance agent. In the develop- 
ment of this part of the Aptitude Index arge number of questions designed 
to measure a number of different traits believed to be important were tried out. 


The specific questions . . . retained are those which have actually proved their 


value in differentiating between groups of su 


adding the weights 
s total is obtained, 


some of the personality 
ated to be important in 


» avery | 


r m : ose which experience 
has shown to be of definite import: Pe 
likely to succeed in the life insurance business. 


In the course of this experimentation 


J approximately 500 questions 
and 8 different tests designed to mea 


sure 38 Personality charac- 
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teristics were tried. These were all tried, however, on men already 
in the life insurance business. Therefore the questions which seemed 
of value were printed in a booklet called the “Personnel Blank” and 
were given to 1,433 applicants in 24 companies. When study on this 
group was complete, a final summary was prepared for the group 
of 211 agents we have already mentioned. The results are shown in 
Table 137. 


Taste 137. Predictive Value of Aptitude Index* 


Percentage of average 


Score 
Part I Parts I and 
art ll 
A 195 206 
B 120 137 
Cc 63 78 
D | 76 39 
E | 47 41 
Avērage.. se. a 100 100 


: * From Kurtz, A. K. How Well Does the Aptitude Index Work? Hartford, Conn.: Life 
nsurance Sales Research Bureau, 1941. 


Predictive Value. Column 1 of Table 137 shows the results when 
Part I of the Aptitude Index is used alone, and column 2 shows the 
results when Parts I and II are used together. Agents who score A 
Produce 195 per cent more business (column 1) or 206 per cent more 
business (column 2) than the average agent whose production was 
equated to 100 per cent. Agents who score E produce only 47 per 
cent as much business (column 1) or 41 per cent as much business 
(column 2) as does the average agent. Clearly, the personality 
characteristics measured by Part II of the Aptitude Index are of 
value in predicting adjustment in the life insurance business. Kurtz, 
reporting some of these results, says: 

There are a large number of factors other than the ability of the man in question 
which have an important bearing on the degree of success attained by a new man 
after entering the life insurance business. Nevertheless, for any given set of circum- 
Stances, the probability of success is much higher for men with high ratings on the 
Aptitude Index than for those with low ratings. 


In our discussion we have not attempted to cover the many fol- 
low-up studies which have repeatedly demonstrated the value of the 
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Aptitude Index in predicting adjustment in the life insurance 
business. But we must comment that these repeated validity studies 
demonstrate another way in which the Aptitude Index is a unique 
test. Few other tests in existence have received the thorough and 
persistent study that has been devoted to the Aptitude Index to 
give it its present degree of predictive or prognostic value. 


THE PERSONAL INVENTORY 


Our fourth example of a prognostic approach to the measurement 
of adjustment is that contained in the Personal Inventory developed 
by Shipley and Graham during World War II. Shipley and Graham 
developed this inventory in response to a request from the Office of 
the Commander in Chief, U.S. Fleet, and the Bureau of Naval Per- 
sonnel, who asked them “to develop a test for emotional stability.” 

As their first step in the development of the Personal Inventory, 
Shipley and Graham made “a detailed analysis of 100 psychiatric 
case histories drawn from the records of the Chelsea Naval Hos- 
pital.” From these records they selected 300 items which seemed 


capable of differentiating psychiatric patients from normal men. 
They prepared all items in pairs, such as 


I have felt bad more from head colds. I have felt bad more from dizziness. 


so that a recruit in answering the questionnaire would always have 


to choose one or the other of two alternatives. Shipley and Graham 
prepared two forms of this inventory, one having 145 items (60 of 
which were scored) and one having 20 items (all of which were 
scored). 

Bray, who reports the development of the Personal Inventory, 
states that “after the test had been standardized and the scoring 
system stabilized, it was administered to various 
results were filed and compared later with the ps 
on each man. . 


groups of men; the 
a ychiatrist’s verdict 
- care was exercised to insure that the psychiatrist 
was in ignorance of the test score at the time of the psychiatric 
examination.” Thus the nature of the validation was to be a com- 
parison between test scores and psychiatric judgment as to whether 
recruits were normal or sufficiently abnormal to be 
the Navy. Two sets of results are presented in Table 
in the first two columns of the table are the 


discharged from 
138. The figures 
best” results which 
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Bray reports and the figures in columns 3 and 4 are the “ poorest.” 
The figures given are cumulative percentages, so as one proceeds 
from the bottom to the top of the table, he will find the figures 
increase in magnitude. In each row he will find the percentages of 
normal and discharged groups receiving the designated or less favor- 
able scores on the inventory. Both sets of data indicate that the 
Personal Inventory is useful in predicting psychiatric classification. 


Tanie 138. Predictive Value of the Personal Inventory* 


Cumulative percentages 


Score Best results Poorest results 


Normal | Discharged | Normal | Discharged 


7-8 | 68 96 76 89 
10-11 | 47 91 50 80 
12-13 | 29 85 | 29 67 
1415 | 21 82 17 58 
16-17 10 7% | 6 49 


i MM 
y Adapted from Bray, C. W. Psychology and Military Efficiency. Princeton, N.J.: Prince- 


ton University Press, 1948. 

Bray reports that “the Personal Inventory was more successful 
as a psychiatric screen than any other test with which it was com- 
Pared. . . . ” Its reliability (split-half) varied in different samples 
from .66 to .91, and it was found to be useful in saving the time of a 
psychiatrist by showing him the men most in need of a psychiatric 
examination. 


IO 


RATINGS: NONANALYTICAL APPROACHES 


We are now to consider one of the most frequently used, and at the 
same time most frequently abused, methods of personality measure- 
ment: the rating technique. This technique is used whenever all 
other methods seem inadequate, whenever no other method is 
available, as a supplement to other techniques, and as an integral 
part of many other techniques of personality measurement. The 
method is frequently abused, however, because its apparent sim- 
plicity leads many untrained individuals to construct and to make 
use of rating scales without any concern whatsoever as to the objec- 
tivity, reliability, or validity of the results that may be secured. It 
is also abused because of the fact that it is used even when better 
methods of assessment are known to be available. 

We shall try in this and in the next chapter to describe some of 
the better standardized rating techniques, to point out some of 
the dangers of inadequate techniques, and to show in what ways the 
various varieties of the technique should and should not be applied. 
We can begin our discussion by pointing out that there should be 
in the use of any rating technique we have in mind, as in the use of 
any other technique of measurement, the accomplishment of two 
objectives. We want to be able to classify individuals upon some 
meaningful trait or variable, and we want to know the reasons for 
the placement of an individual in one category rather th 
other. In accord with these two objectives, we shall cl 
rating methods in two broad categories: analytical 
both classifications and supporting reasons; nonan 
provide classifications without supporting reasons. 
tinction in mind we proceed with a discussion of sev 
nonanalytical rating techniques. 
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MULTIPLE-CHOICE RATING FORMS 


A great many of the employee rating forms used by business and 
industrial organizations require ratings on personality traits and 
are secured on what we can call multiple-choice merit-rating forms. 
The term multiple-choice is more commonly applied to tests than to 
merit-rating forms, but its use in this connection is appropriate. 
Most of the merit-rating forms used by business and industrial 
organizations suggest several traits upon which an employee is to be 
rated and provide for several alternative ratings with respect to 
each of these traits. These alternatives are generally lettered or 
numbered, but frequently they may consist of a series of unlettered 
or unnumbered descriptive phrases set off in separate blocks. 
The following paragraphs contain a number of examples of rat- 
ing forms in current or recent use by various business and industrial 
organizations. 

Allison Division of General Motors. Provision is made for rating 
employees on volume of work, quality of work, knowledge of job, 
attitude toward supervision, cooperation with fellow workers, and 
10 other traits. On each of these traits the supervisor is asked to 
Indicate whether the employee is (O) outstanding, (AA) above 
average, (A) average, (BA) below average, or (US) unsatisfactory. 

Chase Brass & Copper Company. Provision is made for rating 
employees on quality of work, quantity of work, reliability, attitude, 
and flexibility. On each of these traits the supervisor is asked to 
Indicate whether the employee is (O) outstanding, (G) good, (F) 
fair, (MS) minimum satisfactory, (BS) below standard, or (U) 
Unsatisfactory. 

Graybar Electric Company. Provision is made for rating employees 
On quality of work, volume of work, knowledge of assigned job, 
interest in assigned job, dependability, initiative and ingenuity, 
personal appearance, personality, cooperation, health and vitality, 
and six other traits. On each of these traits the supervisor is asked 
to indicate whether the employee is (1) poor, (2) fair, (3) average, 
(4) good, or (5) best. Pr 

John Hancock Mutual Life Insurance Company. Provision 1s made 
for rating employees on appearance, character, influence on others, 
mental flexibility, concentration, imagination, ability to coordinate, 
and inspirational and executive influence. On each of these traits 
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the supervisor is asked to indicate whether the employee is out- 
standing, above standard, satisfactory, below standard, or unsatis- 
factory; or, alternatively, whether the employee is one who far 
exceeds requirements, exceeds requirements, meets requirements, 
partially meets requirements, or does not meet requirements. 

F. L. Hudson Company. Provision is made for rating employees on 
management and direction, leadership, coordination, and depend- 
ability. On each of these traits the supervisor is asked to indicate 
whether the employee is superior, good, fair, or poor. 

London Life Insurance Company. Provision is made for rating 
employees on personality, disposition and cooperation, depend- 
ability, initiative, and judgment. On each of these traits the super- 
visor is asked to indicate whether the employee is outstanding, above 
average, average, or below average. 4 

„McKesson & Robbins Company. Provision is made for rating 
employees on accuracy, output, adaptability, dependability, and 
cooperation. On each of these traits the supervisor is asked to indi- 


cate whether the employee is outstanding, superior, better than 
satisfactory, satisfactory, or unsatisfactory. 


National City Bank & Trust Company. Provision is made for rating 
employees on cooperation, thoroughness, resourcefulness, 
manner of speech, tactfulness, self-confidence, 
tive. On each of these traits the supervisor 
whether each employee is superior, above ave 
average, or unsatisfactory. 

Pure Oil Company. Provision is made for rating employees on 
quantity of work, quality of work, knowledge of work, use of working 
time, cooperation, and initiative. On each of these traits the super- 
visor is asked to indicate whether the employee is to be rated in a 
first, second, third, fourth, or fifth class. 

Union Central Life Insurance Company. Provision is made for 
rating employees on quantity of work, quality of work, knowledge 
of work, carrying out instructions, judgment, and working with 
others. On each of these traits the supervisor is asked to indicate 
whether the employce is above average, average, or below average. 

The foregoing list is rather long, but by it the author h 
demonstrate the great preponderance of 
“multiple-choice” nature of the scales, 
one company to another. Unfortunately 


grooming, 
judgment, and initia- 
is asked to indicate 
rage, average, below 


opes to 
Personality traits, the 
and the great variation from 


there is very little published 
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information (the author will hazard a guess there is none) concerning 
the objectivity, reliability, and validity of any of the foregoing 
techniques. We cannot say, therefore, how well or how poorly these 
and many other rating scales used in business and industry serve 


their intended purposes. 
è 


NUMERICAL RATING SCALES 


In the preceding section we made several references to numerical- 
type rating scales. The rating method used by the Graybar Electric 
Company requires a supervisor, in rating an employee, to circle or 
underline a 1, 2, 3, 4, or 5. The method used by the Pure Oil Com- 
pany is similar but instead of the numbers 1, 2, 3, 4, 5, the adjectives 
first, second, third, fourth, and fifth, are used. It is to be noted that 
most of the other rating methods described, even those making use 
of descriptive adjectives, could also have been set up on a numerical 
basis. 

Crooks. It is not uncommon for us to find a numerical type scale 
combined with a descriptive-adjective scale. Obviously the two 
extremes of any scale must be defined, and it is a small step from 
this to providing a descriptive term for each of the positions on a 
numerical rating scale. We find, however, that most numerical 
Scales with descriptive adjectives for all alternatives are frequently 
limited to five steps. It is difficult to. think of a sufficient number of 
descriptive adjectives for all steps on a 7, 9, or 11 step scale. This 
can be done, however, and is illustrated in a rating form developed 
by William R. Crooks in cooperation with the Clerical Salary Study 
Committee of the Life Office Management Association. This scale 
'S reproduced in Fig. 8. Its purpose is to provide a means of permit- 
ting a supervisor to rate an employee on the same variables as those 
underlying the Life Ofice Management Association’s Point Plan of 
Job Evaluation. 

Ferguson. A second illustration in which a numerical scale is 
Combined with a descriptive adjective scale is given in Fig. 9. This 
form was developed by the author and has been used, among others, 
by the Clerical Salary Study Committee of the Life Office Manage- 
Ment Association. It was used by this committee to secure ratings 
on 30 traits for more than 3,000 clerical employees. 

The instructions for the use of the scale we have just mentioned 
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Fic. 8. Rating scale developed by William R. Crooks for the Life Office M 


ment Association. (From Life Office Job Evaluation Plans. New York: Life Office 
Management Association, 1941.) 
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If the Agent's performance is DISTINCTLY SUPERIOR. .......-.-....--- circle the.....9 
If the Agent's performance is CONSIDERABLY ABOVE AVERAGE....... circle the.,... 8 
If the Agent's performance is MODERATELY ABOVE AVERAGE.... circle the..... 7 
If the Agent's performance is SLIGHTLY ABOVE AVERAGE......200060555) circle the. .... 6 
If the Agent's performance is AVERAGE. -5 
If the Agent's performance is SLIGHTLY BELOW AVERAGE. „circle the. ....4 (17%) 
If the Agent's performance is MODERATELY BELOW AVERAGE. ....... circle the, ....3 (12%) 
If the Agent's performance is CONSIDERABLY BELOW AVERAGE....... circle the... 2 (1%) 
If the Agent's performance is DISTINCTLY INFERIOR... ...-+++++-+++++ circle the. ....1 (4%) 


Percent of Agents who should receive each rating. 


rating scale. (From Agents’ Experimental 


Fig, 9 are ihe sail 
+9. A descriptive adjective numerica a 
B ee an Life Insurance Company, 1950.) 


Performance Ratings. New York: Metropolit 


are given in Table 139. In one of the studies in which this scale was 
used, more than 50 cooperating life insurance companies were asked 
to rate typical cross sections of their home-office-employee groups. 

n spite of this admonition the rating distributions showed a marked 
degree of skewness. Therefore the cooperating companies did not 
Select random or representative cross sections of their employees, or 
the supervisors were not so objective as might have been desired, or 


the employees rated represent a truly superior portion of the general 
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(a) 


(2) 


(e) 


(d 


(e) 


(A) 


G) 


For the small group of Agents whose performance 
the proper proportions, if any, of the various ratings 
rate the performance of 100 (or, better, 1,000) 
distribute themselves approximately as follows: 
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Tase 139. Instructions for Using the Rating Scale in Fig. 9* 


Draw a circle around the figure 9 after the name of each Agent whose present job per- 
formance is distinctly superior and is better than that of at least 96 per cent of all Agents 
with whose job performance you are acquainted. 

Draw a circle around the figure 8 after the name of each Agent whose present job per- 
formance is considerably above average and is better than that of at least 89 per cent, 
but not better than that of 95 per cent, of all Agents with whose job performance you are 
acquainted. 

Draw a circle around the figure 7 after the name of each Agent whose present job per- 
formance is moderately above average and is better than that of at least 77 per cent, but 
not better than that of 88 per cent, of all Agents with whose job performance you are 
acquainted. 

Draw a circle around the figure 6 after the name of each Agent whose present job per- 
formance is slightly above average and is better than that of 60 per cent, but not better 
than that of 76 per cent, of all Agents with whose job performance you are acquainted. 
Draw a circle around the figure 5 after the name of each Agent whose present job per- 
formance is average and is better than that of at least 40 per cent, but not better than 
that of 59 per cent, of all Agents with whose job performance you are acquainted. 
Draw a circle around the figure 4 after the name of each Agent whose present job per- 
formance is slightly below average and is better than that of at least 23 per cent, but not 
better than that of 39 per cent, of all Agents with whose job performance you are 
acquainted. 

Draw a circle around the figure 3 after the name of each A 
formance is moderately below average and is better th 
but not better than that of 22 per cent, of all Agents 
are acquainted. 

Draw a circle around the figure 2 after the name of each Agent whose present job per- 
formance is considerably below average and is better than that of at least 4 per cent, but 
not better than that of 10 per cent, of all Agents with whose job performance you are 
acquainted, 7 

Draw a circle around the figure 1 after the name of each A 
formance is distinctly inferior and is no better than that 
Agents with whose job perform 


gent whose present job per- 
an that of at least 11 per cent, 
with whose job performance you 


gent whose present job per- 
of at least 3 per cent of all 
ance you are acquainted. 

you will rate, it is difficult to determine 
to be assigned. If, however, you were to 
Agents, you would find that the ratings would 


Rating Description of rating Per cent of agents who 
_| should receive each rating 

9 Distinctly superior performance............. ae 4 

8 Considerably above average performance. . 7 

7 Moderately above average performance, . 12 

6 Slightly above average performance... Pe Oia 17 

5 Average performance.. n... 4 20 

. 17 

12 

2 7 
1 

LAE E ee 4 


* From Agents’ Experimental Performance Ratings, N, 
Company, 1950. 


ew York: Metropolitan Life Insurance 
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population. Probably all three of these factors contributed to the 
results, but the first two must carry the greatest responsibility. 

It cannot be proved, of course, that skewed distributions are 
erroneous. But neither can it be proved that they are correct. 
Psychologists are usually wary of accepting such skewed distribu- 
tions, however, as representing the true state of affairs with respect 
to the distributions of the variables in the population being studied. 
We can think of too many reasons: the tendency for a supervisor to 
be lenient, the tendency for a supervisor to give the employee the 
benefit of the doubt, the tendency for a supervisor to rate an em- 
ployee high on all traits because of his superiority in only one trait, 
or the fear on the part of a supervisor that low ratings will indicate 
that he is a poor supervisor. These, and many other reasons that 
could be cited, make us wary of accepting as fact a skewed distribu- 
tion of ratings. 

But since these are the kinds of distributions we will usually get 
on numerical rating scales, what can we do about them? Three 
Courses of action are open: we can discard them; we can modify 
them; or we can ask supervisors to do the job over. Which course of 
action we choose will depend upon the particular object we have in 
mind. If our object is to get supervisors to understand the errors 
made so that we can get a more correct set of ratings from them, we 
shall choose the third alternative. If we feel that another technique 
will Prove suitable, we shall choose the first alternative. But if we 

ave secured the ratings for a one-time use, it is possible that we may 
choose the second alternative. And by this alternative we mean some 
Process, such as regrouping the intervals, which would give more 
nearly normal distributions. In our example this was the course 
adopted. (See Table 149, Chap. 11.) The scale was changed from a 
nine-step scale to a five-step scale. Step 9 was reassigned a value 
of 5, step 8 was reassigned a value of 4, steps 7 and 6 were grouped 
together and assigned a value of 3, step 5 was reassigned a value of 2, 
and steps 4, 3, 2, and 1 were grouped together and reassigned a 
Value of 1. The regrouping does no violence to the original order of 
the ratings, it looks more reasonable from a common-sense point of 
View, and it lends itself a little more easily to treatment by normal 
Curve statistics. 

The ratings just discussed were solicited and received by mail. 

he instructions were written, and accompanied the request for the 
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ratings. Let us turn, therefore, to a second situation in which this 
rating form was used in a series of conference sessions during which 
raters were given careful oral instruction. The distributions resulting 
in this situation are decidedly improved over those secured in the 
unsupervised situation. Therefore, it appears that close supervision 
over the rater while he is making his ratings is a major factor in 
securing adequate distributions on a numerical-type rating scale. 
OSS. A third example of a combination numerical and descriptive- 
adjective rating scale is that developed by the Assessment Staff of 
the Office of Strategic Services and used by them in the studies we 
shall describe in Chap. 14. This scale is given in Table 140. Like the 


Tapte 140. Rating Scale Developed by the OSS Assessment Staff* 


5 | Very superior 7% 
4 | Superior 18 
3 | High average | 25 
2 | Low average 25 
1 | Inferior 18 
0 | Very inferior 7 


* From Assessment of Men. New York: Rinchart & Company, Inc., 1948. 


scales we have just discussed, the OSS scale consists of a series of 
numbers, a series of descriptive adjectives, and a series of percent- 
ages indicating the proportion of ratings expected in each step of the 
scale. “One of the advantages of this scale is that i 


t can easily be 
converted into a two-point, 


three-point, or four-point scale, or, 
by using pluses and minuses in marking, into an eighteen-point 
scale... .” The OSS Assessment Staff developed this scale to 
provide a summary of a large number of judgments, to transmute 
clinical observations into a form amenable to statistical treatment, 
and to provide a brief mode of communicating observational 


nee ee results 
to other individuals. 


RANKING METHODS 


The numerical methods of rating which we have just discussed 
have as one of their chief objects the s 


series of equidistant steps or intervals. M 
this hoped-for result, but this does not 


pacing of subjects along a 
any of them do not achieve 
alter the fact that the steps 
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on a numerical rating scale are supposed to be placed at known 
distances from each other. 

There are few measurements in psychology, and there are none in 
the field of personality, that possess a very high degree of accuracy. 
Therefore, some psychologists assert that it is foolish for us to use 
numerical scales and other similar scales because we are fooled into 
thinking we have done some real measuring when in reality we have 
not. They contend that all we can do in psychology, and particularly 
in the field of personality, is to rank-order our subjects. Somebody is 
more this or less that, but we do not know how much more or how 
much less. This being the case, it is argued that we might just as well 
use ranking scales to start with, and it is our purpose in this section 
to present three studies in which ranking scales have played impor- 
tant roles. 

Landis. The first of these studies was conducted by Professor 
Carney Landis and is reported in his book Sex in Development. 
Landis and his collaborators were interested in evaluating “the 
importance of . . . psychosexuality in psychopathology.” Their 
procedure was to study “the growth and development of emotional 
and sexual patterns of personality in two groups of women, one 
normal, the other psychotic or neurotic.” Landis and his coworkers 
Were interested “in determining whether the psychosexual com- 
Ponents of the adult personality” can be considered as “the end 
product of earlier incidents, events and relationships.” “The essen- 
tial object of their study . . . ” says Landis, “revolved around the 
following questions: What is the normal (average) pattern of psycho- 
Sexual development? How do deviations in this pattern affect the 
adult personality? What are the characteristics of psychosexual 
development of different types of adult personalities ?” 

Landis defines the term psychosexual development as “the growth 
and changes in the biological, psychological, and sociological aspects 
of sex in the course of the life history of the individual.” We under- 
Stand, then, that Landis was interested in discovering the early and 
developmental antecedents of the adult or mature stage of psycho- 
Sexuality. He was interested, also, in finding what departures from 
the normal developmental sequence could occur and in what way 
these departures from the normal or typical developmen tal sequence 


affect the character of adult psychosexuality. f ; 
Interview Data. Landis secured his data via a controlled-interview 
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technique, supplemented with questionnaire forms and physical 
examinations. The data came from 153 normal women and 142 
abnormal women. The material of chief interest for us is that secured 
in the controlled interview. In this interview a total of more than 50 
questions was put to each subject. These questions concerned “the 
facts and phantasies related to psychosexual development. The 
subjects were asked to describe both the incidents of their early 
lives and the emotional value of such incidents.” 

Each subject was asked these questions in a natural interview, and 
the interviewer took down near verbatim notes of all answers. Each 
question was pursued until the subject began responding with 
material irrelevant to the question involved. When all interviews 
had been completed, it was necessary for Landis and his coworkers 
to quantify the material in some meaningful and useful way. And 
this is our reason for being interested in Landis’s study. 

Quantification was accomplished, says Landis, by a careful reading 
of all case histories “in order to determine just what personality 
variables” could be evaluated. Then “the general divisions or steps 
of the scales were set up arbitrarily. The frame of reference for 
these arbitrarily set limits was the entire group of 295 individuals 
studied. . . . The number of steps in each scale of ev 
depended upon the amount of discrimination a 
material at hand.” Further, “the steps or divisions on the evaluation 
scales did not constitute a continuum nor was there any particular 
attempt to establish equal intervals between the steps.” 

We see here an important dichotomy with certain other scale- 
construction techniques. For example, in the equal-appearing- 
interval method of attitude-scale construction we assume a con- 
tinuum on which attitudes may be graded. And we assume that it is 
possible for an individual to have an attitude at any position on the 
scale from one extreme to the other. The scale-construction work 


proceeds without reference to the empirical distribution of attitude 


in a group. In contrast, the method used by Landis in the develop- 
ment of his various psychosexual scales makes no assumption about 
the general type of distribution to be expected. The empirical facts 
are first secured, these are examined, and scales are constructed in 


accord with whatever discriminations exist for the group being 
studied. 


The first method has the advantage, if it is otherwise acceptable, 


aluation 
ppropriate to the 
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that the steps involved transcend any particular group that may be 
tested. It may have the disadvantage, of course, of not being suffi- 
ciently tied down to empirical data to be realistic. The second 
method has the advantage of being based directly upon empirical 
data but may have the disadvantage of being so restricted by the 
particular data at hand that the scales cannot be applied to other 
groups. 

Scale Development. Landis says that “after a particular personality 
concept or variable had been selected for analysis, the case histories 
of all individuals were reread and each step of the evaluation scale 
was defined in terms of the actual responses of the subjects.” Then 
these scales were used by three judges independently of each other. 
To eliminate or to minimize the effects of halo, a judge would read 
only that part of the case history which was pertinent and would not 
review the entire case. 

Final evaluations or ratings were not assigned to a subject unless 
two of the raters were in complete agreement and the third rater 
disagreed not more than one step. If this criterion could not be met 
for any given rating, the matter was settled in a group conference. 
According to Landis, only 5 per cent of several hundred ratings had 


to be settled in group conference. 
The criteria by which the adequacy of each psychosexual scale 


Was judged may be listed as follows: 


1. The steps in each scale had to be sufficiently clear and exact to produce con- 


Sistency (as above defined) among the judges. ; 
= 2. The steps in each scale had to provide “for differentiations” among the 


Subjects rated. 
3. The steps in each scale had to differentiate between individuals “exhibiting 


Ogically different reaction patterns.” 
Altogether, 14 scales were developed. There were two scales with 


three steps each, four scales with four steps each, and eight scales 


With five steps each. An example of one of the scales is given in 


Table 141, 
Reliability. We can gain son 
y reference to the data prese 


ne idea of the reliability of these scales 
nted in Table 142. The entries in this 
table show the percentage of cases upon which, for each scale, all 
three judges were in complete agreement. According to these data, 
the scales for prepuberty sex aggressions (affective response), 
Masturbation (occurrence), and menstruation (affective response) 


300 Personality Measurement 


Tase 141. Landis’s Psychosexual Scales* 
1. Early sex information 
Judges’ rating of factual adequacy ; . i 
1. No instruction or information. Subject denies all knowledge of sex differences asa child. 
2. Fantastic explanations by adults; inadequate information from playmates; childish gos- 
sip; dirty stories; inadequate factual information (parents, schools). 
3. Adequate factual information from parents, books, school hygiene lectures, etc. 
Subjects’ rating of adequacy 
1. Early information adequate, subject satisfied as to manner in which acquired whether 
from parents or otherwise. 
2. Early information from parents fairly adequate, but wish parents’ attitude had been 
more frank. Felt she should have been given more information. 
. Parents gave no information, did not discuss, or gave inadequate information. No ap- 
parent resentment toward parents for lack of frankness. Accepted as natural attitude, 
no information from other sources. Would give her own daughter more information. 


4, Parents gave no information, or inadequate information. Resentment toward parents’ 
attitude. Felt information should have come from parents rather than from other 
sources. 

5 


. Lack of information, misinformation, or extreme unpleasantness of sources of informa- 


tion. Early information so disgusting she has never been able to view as pleasant. Too 
much information when she did not want it. Disliked being told about sex. 
* From Landis, C. Sex in Development. New York: Paul B. Hoeber, Inc., 1940. 


Tape 142. Percentage of Cases on Which All Three Judges Were in Complete 
Agreement on Original Rating* 
Prepuberty sex aggressions (affective response) . 


Anisia 79 
Early sex information (subjective rating) » if 
Menstruation (affective response)... . 62 
Problem of family tics (importance). . a S51 


Homoerotism (occurrence) 
Homoerotism (affective response) 
Masturbation (occurrence)... aaa 0c cece eee. 64 
Masturbation (affective response). 


EIA 56 
Heterosexual experience (affective response)... . . 


Narcissism (importance)................ 
Masculine protest (importance)... 
General compatability in marriage 
Sex adjustment in marriage. .... . 
General adjustment in marriage., 


*From Landis, C. Sex in Development. New York: Paul B. Hoeber, Inc., 1940. 


are the most reliable; and scales for early sex information (subjective 
rating), homoerotism (occurrence), and sex adjustment in marriage 
are the least reliable. 


Psychosexual Development. After a complete application of the 
scales to all case histories, Landis states: 


. . the course of personality development of most of our group showed certain 
common characteristics which made it possible to classify each subject in terms of 


E 
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expected levels of development. We found that certain experiences, attitudes and 
practices were common and usual at certain ages in most of these women. We con- 
sidered these findings as the basis of approximate age norms for the psychosexual 
development of the individual. A brief characterization of each of these levels is 
as follows: 

15 to 17 years. The average girl of 15 to 17 years has gone out with boys fairly 
frequently but has not had complete sex experience. Her relationship with boys 
has been carried to the extent of mild petting. She is still somewhat tied to her 
family emotionally, and economically has not reached an independent adult status. 
Her sex information is fairly complete, but she feels quite constrained in discussing 
such matters with her family. She still takes a good deal of pleasure in associating 
with girls of her own age but is definitely interested in boys, more as dates than in 
any more serious fashion. She has fairly strong emotional attachments to members 
of her family, and to her friends of either sex, but is not exclusively attached to 
any one of them as an individual. > 
at this age evinces no interest in boys and restricts her interests 


An immature girl 3 
and activities to members of the family or to girl friends. She has a definitely un- 


favorable attitude toward sex which may show itself in a complete lack of interest 
or in disgust toward sexual matters. Her attachments to members of the family 
may be so strong that she has no interests or activities outside the home. 

18 to 21 years. The average girl between these ages may be expected to show the 
following characteristics. She has many friends and activities outside the family 
circle which keep her away from home a large part of the time. She is not yet wholly 
independent of her family financially, however. She has had a fairly complete 
knowledge of sex since she was 16 years old. She had gone out with six or more 
boys on different occasions, and feels that she is attracted to one of them, but has 
not been thinking specifically in terms of marriage. Her attitude toward sex is 
s not preoccupied with boys although more and more 
of her time is spent in planning or day-dreaming about particular individuals. 

The immature girl of this age has little interest in boys and has had few dates. 
Her physical contact with them has never gone beyond kissing, and such intimacies 
as have occurred have not been particularly enjoyable, She probably has a negative 
attitude toward all sexual matters. Her sex information was not complete until 
she was 18 years old, but she does not remember any strong curiousity about sex. 
She probably masturbates occasionally and shows evidence of being primarily 
interested in herself. 

An extremely immature person of this age has never had any dates or any love 
affairs, her sex information is still incomplete, and she masturbates frequently or 
excessively, She is still very closely emotionally attached to her family, and ex- 
tremely narcissistic. Her attitude toward sex is one of disgust or apathy. ; 

22 to 25 years. The mature woman has completely resolved her famil y ties and is 
free from any pronounced signs of narcissism. Her heterosexual intimacies have 


included some sex play or petting. She is free from any unfavorable sex attitudes. 
a22 years old or more had her first date after the age of 19 


The immature women 22 3 à ate nee 
Years and since that time has gone out with less than six men. Her physical intimacy 
5 Her sex information was not complete 


with men rarely has gone beyond a kiss. 


One of healthy interest. She i 
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until after she was 18 years old. She may masturbate occasionally, show evidence 
of narcissism and of a poorly resolved family situation. 

An extremely immature women of this age is one who has had no dates or attach- 
ments to men, and has not yet acquired complete sex information. She masturbates 
frequently or excessively, shows extreme narcissism, and close attachment to her 
parents. 


It now becomes possible to compare the psychosexual level of any 
given subject with these various stages and to determine whether 
this person has reached a state of psychosexual level appropriate 
for her age, or whether she is advanced or retarded with respect to 
it. We can also compare different adult groups with each other: 
abnormal vs. normal, single vs. married, happily married vs. un- 
happily married, and so forth. It is also possible for us to determine 
the differential antecedents for abnormality in contrast with nor- 
mality, for remaining single in contrast with getting married, and 
for being unhappy in contrast with being happy. 

Mead. A second study in which ranking scales have played a 
major part is that conducted by Margaret Mead on Cooperation and 
Competition among Primitive Peoples. Mead’s purpose was to see if 
some insight could be gained relative to the factors which, on the 
one hand, appear to lead to cooperative behavior and which, on the 
other hand, appear to lead to competitive behavior. Her source 
material consisted of the studies of 13 primitive groups of people: 
the Arapesh of New Guinea, the Eskimo of Greenland, the Ojibway 
of Canada, the Bachiga of East Africa, the Ifugao of the Philippine 
Islands, the Kwakiutl of Vancouver Island, the Manus of the 
Admiralty Islands, the Iroquois, the Samoans, the Zuñi of New 
Mexico, the Bathonga of South Africa, the Dakota, and the Maori 
of New Zealand. Mead, with her associates Mirsky, Landes, Edel, 
Goldman, Quain, and Mishkin, prepared accounts of the behavior 
of each of the afore-mentioned groups and then undertook to analyze 
each culture from the standpoint of those factors relevant to co- 
operative and competitive behavior. 

Mead prepared a series of questions and asked each of her asso- 
ciates, as well as herself, to attempt to find an answer to each of these 
questions for the various groups studied. To give the reader an 
account of the nature of these questions, we can do no better than 
to quote Mead herself. We give a portion of her list of questions 
below and remind the reader that these and similar questions in 
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five other areas (social organization, political structure, social 
structure, view of life, and educational process) had to be kept in 
mind as Mead and each of her associates prepared their accounts 
of the 13 primitive societies. With reference to the study of eco- 


nomics, Mead asked each of her associates to 


Watch closely the correspondence between the group habits and the actual 
economic conditions, that is, the amount of genuine environmentally determined 
social cooperation which is necessary. Do they use boats needing several people to 
build them, to man them? Try to estimate the extent to which environment 
dictates cooperation, and at what ages, in what activities, etc., so that later you 
can form a judgment on whether this factor is important in dictating habits in 
other spheres. For instance, are the customary occupations of women solitary 
or social as compared with the men? Distinguish carefully between individual 
as when a group of women meet to weave mats 


activities performed in groups 
together, and activities which require each member of the group actually to con- 
tribute to one end, as in a drive of game or fish. (In mat weaving, for instance, 
the women may meet to make mats for the dowry of the chief’s daughter; they 

e is nothing in the actual work they 
are doing which dictates cooperation, and in fact each one may compete with 
the other to make the best mat, or work the fastest.) When an activity does require 
group effort, as a fishing drive, is the contribution of individual effort assessed 
Individually so that the actively existing cooperation is given no social expression ? 
(E.g., when four men fish in the canoe which belongs to each man, and the owner 
of the canoe receives two-fifths of the catch, each other man a fifth; or when the 
fish is devoted without comment to some communal purpose.) How many differ- 
ential skills are involved in the economic life? Is a high quality of skill demanded 
in'any particular activity ? Is it socially recognized? Is there a different organization 
of behavior in the activities demanding skill and those which do not? 

Is the food supply plentiful, seasonal, unreliable? Does the acquisition of food 
depend upon skill, luck, foresight, aggressiveness in securing a share of a fixed 
supply? Is the absence of cooperative effort, or the absence of a partner in coopera- 
tive effort, such as a wife or parents, economically penalized in the society? Are 
other materials besides food—such as wood, clay, metal—limited in supply, difficult 
to secure, etc? 

Is the community self-contained? Is there division of labor? Is there dependence 
on trade? How is the trading situation organized: cooperation between members 
Of the community as over against members of other communities, or cooperation 
between trade partners, across community lines? Is this extended to groups? Among 
manufactured articles how much differentiation is there between the items of a 
given type, in value, beauty, etc? When considerations are used to evoke trade, 
free barter, compulsive barter (where one subject will be traded only for another 
of a particular type): magical compulsion, maintenance of alliances, etc?- 

What are the property arrangements? Here note particularly whether individuals 
own a share of ground, but not a definite piece of ground, etc. Does one inherit 


are engaged in social cooperation, but ther 
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property, or the right to share in a cooperative agricultural group, etc? What are 
the rules in regard to newly created property as over against old property? What 
are the proportions between old property which is inherited and n 
which an individual can create by skill, industry, social m 
property perishable? , 


What is the position of the skilled worker? Is he set apart, given different rewards 
from others? Do skilled workers compete or cooperate among themselves? 

What is the nature of the economic activity? Does it require long-time planning, 
unremitting daily activity? Does a sick man fall inevitably behind? Are misfor- 
tunes individual—like the loss of a valuable fishing trap—or communal in incidence 
—as a crop failure? Can you estimate the time spent in different kinds of activities, 
cooperative and competitive? 


ew property 
anipulation, etc? Is 


Conclusions. When all accounts h 


ad been prepared, Mead at- 
tempted a classific 


ation of the societies upon three variables: co- 


/fugoo 


Jndividuohstic 
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Fic. 10. The individual, cooperative, and competitive behavior classification of 13 


primitive societies. (From Mead, M. 


(Ed.) Cooperation and Co 
Primitive Peoples. New York: 


mpetitition among 
McGraw-Hill Book Company, 


Inc., 1937.) 
operative, competitive, and individualistic. We can show her results, 
as Mead did herself, in the form of a diagram (see Fig, 10). 

The Kwakiutl are listed as most competitive, the Ojibway and 
Eskimo as most individualistic, and the Zuzi and Bathonga as most 
cooperative. From each of these extreme Positions the other societies 


shade off in mild gradation, as in most other types of classifications 
of human behavior. 
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Having classified the 13 primitive societies upon the relative 
predominance of individualistic, cooperative, or competitive be- 
havior, Mead studied each group with respect to character formation 
and ego development. In other words, she wished to use the variables 
individualistic, cooperative, and competitive as her predictors, and 
character formation and ego development as her predictands. Let 
us see what conclusions she reached. 


l. “Strong ego development can occur in individualistic, competitive or cooper- 
ative societies.” In other words, the characterization of a society as individualistic, 
as competitive, or as cooperative is of no value in predicting the type of ego devel- 
Opment. Its correlates must be looked for elsewhere. 

2. “Whether a group has a minimum or a plentiful subsistence level is not directly 
relevant to the question of how cooperative or competitive in emphasis a culture 
will be.” Here Mead was interested in subsistence level as a predictor and in the 
variables cooperative, competitive, and individualistic as predictands. No useful 
relationships were discovered. p 

3. “The social conception of success and the structural framework into which 
individual success is fitted are more determinative than the state of technology or 
the plentifulness of food.” In other words, social conceptions of success and struc- 
tural framework as just defined can be used as useful predictors and will indicate 
to some degree whether the culture can be classed primarily as individualistic, 
Cooperative, or competitive in nature. : f 7 

4. “There is a correspondence between: a major emphasis upon competition, a 
Social structure which depends upon the initiative of the individual, a valuation of 
Property for individual ends, a single scale of success, and a strong development of 


the ego.” 


Tena m 
5. “There is a cor ae Oe) aS IS UE , 
Social structure which does not depend upon individual initiative or the exercise 


of power over persons, a faith in an unordered universe, weak emphasis upon rising 


respondence between: a major emphasis upon cooperation, a 


1n status, and a high degree of security for the individual.” 


To classify these 13 primitive societies, Mead considered the 


ollowing questions: 


What are the principal ends to which an individual devotes his time? What are 
the Principal ends to which group activities are directed? What are the proportions 
9f time and energy devoted by individuals and by groups to ends which are shared, 
competitive, and individual > [Mead states that] the application of these criteria 
is admittedly rough since it involves a judgment of more or less upon data them- 

b heless the range of difference in the thirteen 


s ; z 
elves incomparable. . . . Nevert 1 
Cultures in these aspects of life was so great, and the extremes of the gamut so 


Clear, that all those engaged in the study were unanimous in their agreement. 
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Objectivity. In anthropological studies data on objectivity are 
extremely hard to obtain and, as in the study just discussed, are 
seldom given. Because such data are hard or even impossible to 
get does not, however, mean that we should, as many do, ignore 
the subject altogether. Let us review Mead’s study with a view to 
determining the points on which we should concern ourselves with 
the factor of objectivity. 

The first point is in connection with the informants, the natives 
of the primitive groups giving the anthropologist his information. 
If the anthropologist had interviewed a different set of informants, 
would he have secured the same information? The only way an 
anthropologist can be sure of this is to interview several informants 
independently of each other and see if the data he gets from these 
various sources is the same. If it is, he can have some assurance that 
his informants have been objective. 

The second place at which we need to show concern is in the 
observations which the anthropologist himself makes. Would an- 
other anthropologist have seen the same things and have interpreted 
them in the same way? The only way to be sure is to have another 
anthropologist make a parallel study, and this, of course, is expen- 
sive and is seldom possible. It must be understood, however, that 
unless such studies can be made, the anthropologist’s observations 
and interpretations cannot be said to be beyond doubt as to their 
objectivity. This is a serious matter, for there is 
to indicate that different observers 
technical standpoint, will persist in seeing different things and in 
interpreting the same things seen in different ways. 

The third point where we need to concern our 
tivity in Mead’s study is with the reinterpretatior 
anthropologist not involved in securing the origir 
sons for our concern here are much the same as t 
the preceding paragraph. 

And last we might be concerned with 
pretation of the results. 
pologist have interprete 
did? Perhaps so, but th 
have been the result. 

Our purpose in raising the fore 
to lessen our faith in, and rega 


abundant evidence 
» equally well-trained from a 


selves with objec- 
n of a study by an 
nal data. The rea- 
hose mentioned in 


Mead’s own over-all inter- 
Would another equally competent anthro- 
d the data in the same manner that Mead 
here is presented no proof that this would 


going questions on objectivity is not 
rd for, Dr. Mead as anthropologist. 
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Our purpose, however, has been to point out the places in which 
anthropological studies can be lacking in objectivity and so make 
us all aware and ever on the alert to see that due care has been taken, 
whenever possible, to provide proof for the objectivity of the data 
which are presented. 

Reliability. Mead presents no data on reliability. This would refer, 
of course, to the extent to which upon a subsequent occasion, -or by 
another observer, the 13 primitive societies would be ranked in the 
same way. This would be impossible for Mead herself to carry out, 
for she could not, undoubtedly, forget completely her first classifica- 
tion. And since she could not do this, she is not in a position to 
render a completely independent classification. The only way to get 
at the reliability of the classifications would be to have another 
observer make a separate classification. And this too would probably 
be impossible, for any anthropologist capable of making such an 
independent classification would, in the course of his early training, 
have familiarized himself with Dr. Mead’s studies. Therefore all 
we can do is point to the problem but have no solution to offer as 
to a way in which appropriate reliability data could be obtained. 

Validity. Here we are concerned with the accuracy of the results. 
Do our classifications correspond with actual fact as determined in 
Some completely independent manner? Here, again, we are stymied 
in the solution of the problem, but here, again, we cannot ignore Its 
existence. 


THE PAIRED-COMPARISONS METHOD 


The rating methods we have been discussing require us to make a 
Series of judgments on a number of isolated stimuli. If we have the 
Problem of rating employees on proficiency, adolescent girls on 
Psychosexual development, and primitive societies on cooperative 
and competitive behavior, we consider our stimuli one at a time and’ 
assign a number, a rank order, or a letter grade, and then go on to 
the next. The judgments we are called on to make are frequently 
hot easy, but the procedure is simple and straightforward. 

e can point to several disadvantages, however. Ifwe have a large 
number of stimuli to rate, we may not be able to keep in mind exactly 
the same set of standards throughout all our ratings. One case may 
call to mind certain items of information or behavior which we 
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forget to reconsider in rating a second case. And the standards we 
use one time may be higher or lower than those we apply a second 
time. Also any of the methods we have discussed suggest that we 
keep in mind some “absolute” standard which we proceed to apply 
case by case. 

A method designed to overcome some of these difficulties is the 
paired-comparisons method of rating. This method, as its name 
implies, requires that stimuli be rated in pairs and not one by one. 
It is thought that this pairing of stimuli makes it possible for a 
rater to give a more accurate judgment than any technique requiring 
him to judge stimuli, one by one, in terms of some previously estab- 
lished standard. 

The paired-comparisons method of rating requires the rater to 
compare each stimulus to be rated with each of the other stimuli to 
be rated and to render a judgment as to which stimulus in each pair 
is the larger, the better, the preferred, and so on. For example, if 
five people are to be rated on courtesy, it is necessary that the rater 
compare each individual with every other individual and for each 
of these pairs render a judgment as to which individual in the pair 
is the more courteous. This requires the rater to make 10 judgments. 
If 10 people were to be rated, the rater would have to make 45 
judgments. The number of judgments required for any given number 
of stimuli can be determined by the formula N(N — 1)/2, in which 
N is the number of stimuli to be rated. 

Variations. There are several methods by which paired-compari- 
sons ratings can be secured. One method is to list all pairs and to ask 
the rater to underline or circle the preferred member of each pair. 


A second method is to present the pairs on 3- by 5-inch cards; and 


still another method is to use a diagram such as that presented in 
Fig. 11. The first of these methods possesses the advantage that the 


pairs can be randomized, and this gives the experimenter some 
control over the order in which the rater will consider the various 
pairs of stimuli. The second method has the advantage of being 
useful in a shop or industrial situation when the investigator wishes 
to present for consideration one stimulus pair at a time and to have 
the rater indicate his choice before moving on to the next pair. The 
experimenter can shuffle his cards to have them in a new random 
order for the next rater. The last method has the advantage of being 
easy to prepare, and for this reason it is suitable for use in a con- 
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ference situation, e.g., when 20 or 30 raters are instructed simul- 
taneously and when each rater has a different set of stimuli to rate. 

Treatment of Results. We can begin our explanation of the de- 
tails of the paired-comparisons method with this last variation. The 


If the job performance of the Agent listed at the top of the diagram is better than the job performance 
of the Agent listed at the side of the diagram, place. an X in the square under the name of the Agent listed 
at the top of the diagram and to the right of the Agent listed at the side of the diagram. 


These Agents are more proficient than: 


al al alial olin} @| Go| co] ar] aa] @ (nal ono 


Total 


Name of Ayont 


‘These Agents are less proficient than: 


LILL 


sons ratings. (From 4 dgents’ Experimental 


Fic. 11. A chart to facilitate paired compari 
itan Life Insurance Company, 1950.) 


Performance Ratings. New York: Metropol 
X in the square under the better man 
and to the right of the poorer man. When he has completed his con- 
sideration of all individual pairs, he is asked to count the number of 
X’s in each column and the number of X’s in each row. This gives 


rater is instructed to place an 
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him a check on accuracy. For the number of checks in the column 
plus the number of checks in the corresponding row should be equal 
to one less than the number of stimuli available for comparison. 

The column sums indicate the number of times that each stimulus 
is preferred or is judged to be better than each of the other stimuli. 
A stimulus with a column total of 10 is a better or a more frequently 
preferred stimulus than one with a column total of 8, and so forth. 
If all we want is a rank order of preference, we can assign a rank of 1 
to the stimulus with the highest column total, a rank of 2 to the 
stimulus with the second highest column total, and so on. 

But the paired-comparisons method was designed to give us much 
more than a mere rank order. It was designed to give us the actual 
scalar separations between our stimuli. To illustrate how we may 
arrive at these scalar separations, let us start with the data given 
in Table 143. This table shows how a manager of a district life 


TABLE 143. The Statistical Treatment of Paired-comparisons Ratings 


Agent | Votes | Percent age annaia 
A 9 95 82 
B 8 85 70 
C 7 75 63 
D 6 65 58 
E 5 55 52 
F 4 45 48 
G 3 35 42 
H 2 25 37 
I 1 15 30 
J 0 5 18 


insurance office rated 10 of his agents in comparison with each other. ' 
He gave agent A the highest number of votes, agent B the second 
highest number of votes, and so on. 

The first thing which we have to do is to convert these ratings into 
percentages of the maximum possible ratin 


i g- Guilford suggests that 
we do this by the formula % = (c + n/2)/nN. In this formula ¢ 


represents the total number of votes given to a stimulus, N repre- 
sents the number of judges, and 7 represents the number of stimuli. 
When only one judge’s ratings are required, our formula reduces to 
% = (¢ + ¥4)/n. In our example we shall assume we have only one 
judge for each set of stimuli. This will simplify our discussion. 
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Agent A, with nine votes, has a percentage rating of 95, agent B, 
with eight votes, secures a percentage rating of 85, and so on. Next, 
we must convert these percentages into standard scores. This we do 
by means of a table prepared by Hull. Long ago, Professor Hull 
showed how rank orders or percentage ratings could be translated 
into a series of standard scores ranging from 0 to 100. We shall 
not reproduce Hull’s table here, but we shall merely give the results 

' of its application to our data. These results are presented in column 
3 of Table 143. 

The conversions that Hull’s table effects give a series of standard 
scores with a predetermined mean of 5.0 (or 50) and a predetermined 
standard deviation of 2.0 (or 20). In terms of this scale, we can now 
visualize actual scale separations, and this we could not do from the 
original votes in Table 143. If we compare the final standard scores 
with the original votes, we shall see that we have merely stretched 
out both ends of the scale in relation to its more central portion and 
that the farther out the scale we proceed, the more stretching we 
have done. This stretching is considered legitimate in view of our 
common assumption that the steps at the extremes of a distribution 
are usually broader than those in the more central parts of the scale. 
Our original votes do not show this, so we go through the procedures 
we have described to secure this result. 

Basic Assumptions. To understand the nature of the assumptions 
required to provide a rationale for the foregoing calculations, we 
shall have to take a brief excursion into psychophysics. Suppose that 
we have a series of stimuli: S1, $2, S3, S4, and S5, and wish to 
Present them to a group of subjects and evoke their responses. For 
the sake of our illustration, we shall suppose that the proper re- 
‘sponses to our stimuli are R1, R2, R3, R4, and R5, respectively. In 
other words when we present stimulus S1, we expect response Rl. 

hen we present stimulus S2, we expect response R2, and so on. We 
know, however, that because of the variability of human nature, 
chance, and other factors, we do not always get the response we 
expect. So once in a while in response to stimulus S3, we shall get 
response R2 or R4 instead of R3. But if we can exert sufficient con- 
trol over the conditions under which the stimuli are presented and 
Over the situation in which the responses are to occur, we can postu- 
late that in response to stimulus $3 we shall most frequently get 
Tesponse R3. Responses R2 and R4 will occur but less frequently 
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than R3. And responses R1 and R5 will also occur but much Jess 
frequently than R2 and R4. In fact we can expect a normal distribu- 
tion on the response continuum for each stimulus on the stimulus 
continuum. Thurstone calls the response which occurs most fre- 
quently for any given stimulus the modal discriminal process, and 
the standard deviation of all responses about this modal response he 
calls the discriminal dispersion. 

It is to be understood that there is a modal discriminal process and 
a discriminal dispersion for every stimulus on our stimulus con- 
tinuum. A plot of these results for two stimuli is given in Fig. 12. 


R, Ry 3f Rs NLE R, 


Ss 
Fic. 12. The modal discriminal processes and discrimin 


We know from our elementary statistics that any difference can 
be evaluated in terms of its standard error. And by the standard 
error of a difference we mean a measure of the dispersion of all 
possible differences about the difference we have in hand. If our 
standard error is small, our obtained difference will be considered a 
more significant (that is, a larger) difference than if our standard 
error is large. This suggests that we can use the standard error of a 
difference as a measure of the extent to which two stimuli on our 
stimulus distribution depart from each other. The formul 
have under consideration is as follows: 


al dispersions for two stimuli. 


a which we 


Sy. — Sa = X ruva? +o? — 2raio». 
To get the standard error of a difference we must know the standard 
error of the mean of each of the two distributions and the correlation 
between the individual pairs of the two distrib 


utions. It is frequently 
not possible for us to get this information 


when all we have is a 
has demonstrated that 


The first assumption is that there is no corr 
responses made to different stimuli. 
2ro\02. The second 
for different stimuli 


elation between the 
This eliminates the factor 
assumption is that the discriminal dispersions 

are equal to each other. This enables us to 
reduce our formula to Sẹ — Sa = XvaV 2042. In this formula, Sp — Sa 
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represents the difference in scale values between two stimuli, Xpa 
equals “the deviate corresponding to the proportion of judgments 
R, greater than Ra,” and c, represents one discriminal dispersion. 
Now if we let ø, become our unit of measurement, our formula 
becomes Sp — Sa = Xwe V2. This formula requires only that we 
multiply each of our proportions Xia by 1.414. 

A Combination Method. The author has used a rating form that 
utilizes both numerical ratings and paired-comparisons ratings. This 
rating form consists of three work sheets. The first is an acquain- 
tance-rating form, the second is a numerical rating form, and the 
third is a paired-comparisons rating form. We have already discussed 
the last two types of rating, so the acquaintance-rating form is the 
only thing that is new to us. On this form the rater indicates whether 
he is extremely well acquainted, moderately well acquainted, only 
slightly acquainted, or not at all acquainted with each ratee. 

There are six steps involved in the use of the combination rating 
form. These are the securing of acquaintance ratings, the securing 
of numerical ratings, the securing of paired-comparisons ratings, the 
correction of the numerical ratings by their comparison with the 
paired-comparisons ratings, the converting of the paired-compari- 
sons ratings into standard scores, and the averaging of the standard 
paired-comparisons scores and the numerical ratings. 

The application of steps 1, 2, and 3 are obvious, since we have 
already explained them. Step 4, however, requires additional com- 
ment. When a rater has completed both steps 2 and 3, he is asked to 
compare the two sets of ratings. He is instructed to do this in the 


following way: 


1. By means of the column total entered at the bottom of the paired comparisons 
diagram identify the person to whom the greatest number of votes (X’s) has been 
given. Then find the numerical rating assigned to this person. 

2. By means of the column totals entered at the bottom of the paired comparisons 
diagram locate the person to whom the second largest number of votes (X's) has 
been given, And then find the numerical rating assigned to this person. This rating 
will be found to be (a) higher than, (4) the same as, or (c) lower than that assigned 
to the person identified in step 1. If contingency (a) occurs the numerical rating is 
too high. It should be adjusted by reducing it to a value which is the same as, or 


lower than, that assigned to the person identified in step 1. ; . 
3. By means of the column total entered at the bottom of the paired comparisons 


diagram locate the person to whom the third largest number of votes (X’s) has 
been given. And then find the numerical rating assigned to this person. This rating 
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» until complete consistency between 
the paired comparisons ratings and the numerical ratings is achieved. 


When the paired-comparisons ratings have been checked for 
internal consistency and have been used to adjust the numerical 
ratings, composite ratings are computed. To do this the paired- 


assigned when one rater is required 
to rate more than 15 persons (the maximum number for which 


Provision is made on the rating form). The treatment to be discussed 
comparisons ratings. Whenever it is neces- 
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greater number of discriminations in the numerical ratings. And it 
allows the mean of the distribution of the numerical ratings (when 
corrected) to be used as an anchoring point to equate different groups 
of ratees with each other. 

This method requires an adequate preliminary training period and 
supervision and help during the completion of the forms. These 
factors are sometimes considered to be disadvantages, but any 
procedure that is expected to yield valid, objective, and reliable 
results is worth the effort, training, and supervision required. 


II 


RATINGS: ANALYTICAL APPROACHES 


The rating methods we discussed in Chap. 10 were nonanalytical in 
nature. They enable us to make classifications, but they do not 
provide supporting reasons for these classifications. Nevertheless, 
they serve a useful purpose because there are many situations in 
which we can be satisfied with classifications without supporting 
teasons. Many times, however, we need these supporting reasons. 
So we turn in this chapter to methods of rating that can give us 
supporting reasons in addition to classifications. We shall be dealing 
with what we call analytical methods of rating, 


THE FORCED-CHOICE TECHNIQUE 


One of the major difficulties or disadva 
niques that we discussed in Chap. 10 is tha 
by the rater to give any result that he wi 
be most advantageous, therefore, if we could find a method that 
could not easily be manipulated by the rater. Unfortunately there 
are no such techniques, but one which has been alleged to suffer 


somewhat less from rater manipulability than others js the forced- 
316 


ntages of the rating tech- 
t they can be manipulated 
shes to produce. It would 
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choice rating technique. It will be useful for us to examine this 
method to see in what way the attempt is made to reduce the extent 
to which the results can be manipulated by the rater. We shall 
describe the scale which was developed and used by the United 
States Army. 

The essential and important part of this rating scale consists of 
24 sets of adjectives, phrases, or statements such as: 


A. Commands respect by his actions 
B. Coolheaded 

C. Indifferent 
D 


- Overbearing 


The rater must choose from each such set the adjective, state- 
ment, or phrase that best describes the person being rated and also 
indicate the adjective, statement, or phrase that least describes the 
person being rated. Thus of the four alternatives presented, the 
rater is forced to choose two of them, one as descriptive and the 
other as nondescriptive. Now there would be no point in forcing a 
rater to choose among the alternatives offered unless we knew ahead 
of time something about the fundamental nature of the choices 
offered. Therefore let us examine the nature of these choices and see 
in what way it is thought they disguise to some extent the ratings 
which will indicate the highest or lowest degree of proficiency. 

We would certainly guess, in reviewing our example, that the first 
two items, “commands respect by his actions” and “coolheaded,” 
would be considered favorable to the individual being rated and 
that the last two items, “indifferent” and “overbearing,” would be 
considered unfavorable. We would be wrong, however, because only 
one of the first two items is really favorable and only one of the 
last two items is really unfavorable. How is this result achieved? We 
can understand this only by reviewing in detail all steps leading to 
it. Rundquist and Sisson give these steps as follows: 


1. Army officers were asked to write brief essay type descriptions of other officers. 
They were asked to write essays describing both successful and unsuccessful officers. 
2. When these essays had been prepared they were reviewed in detail and upon 
the basis of the material contained in them a series of descriptive adjectives, phrases, 


or statements was prepared. Some of these described successful officers and the 


remainder described unsuccessful officers. ; 
3. This list of descriptive adjectives, phrases and statements was given to a group 


of army officers who were instructed to select other officers well known to them and 
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to indicate the extent to which each of the statements characterized each of these 
officers. Each statement could be said to apply to an officer to an exceedingly high 
or to the highest possible degree, to an unusual or outstanding degree, to a typical 
degree, to a limited degree, or to a slight degree or not at all. 

4. For each statement, a preference value and a discriminative value were deter- 
mined. The first of these indices was computed to show the extent to which the 
alternative answers to each statement were used to describe officers in general 
regardless of any differences in proficiency among them. The discriminative value 
was determined to show the extent to which the alternative answers were said to 


apply to successful officers in contrast to unsuccessful officers. The computation of 
these indices is illustrated in Table 144. 


Taste 144. The Computation of Preference and Discriminative Values* 


REGHONSC isis guia aa ana ay i.s 1 2 3 4 5 
‘amen == aa eons Sum 
WEE Nane aene p sie 0 1 2 3 4 
Frequency: 
Upper third... 1 0 6 6 87 100 
Middle third 3 13 15 64 100 
Lower third... 


Total frequency 


Frequency X weight 
Difference in frequency for upper and lower 


BRIO aae E tu ate ada eons wee at 3 11 21 17 52 104 
* From Sisson, E. D. Forced choice—the new Army rating. Person, Psychol., 1948, 1, 365- 
381. 


The ratees are first divided into three groups: an upper third, a middle third 
and a lower third. Then for each of these groups the frequency of occurrence for 
the alternative answers is ascertained. Each of these frequencies is multiplied by 
an appropriate weight and these products are added and their sum is taken as the 
preference index. In our example, alternative 1 is used only 8 times; alternative 2 
is used 16 times; alternative 3 is used 46 times; alternative 4 is used 44 times; and 


alternative 5 is used 186 times. It is obvious that alternative 5 js the most popular 
answer and is applied frequently to all officers regardless of any differences in merit 
that may obtain. The calculation of the preference index is as follows 


This item has a high preference index and therefore is not a particularly popular 
item. 


The discriminative index is computed by getting the sum of the differences in 
the frequencies applying to the upper and lower third of the group rated. In our 
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example this value is 104 which is the sum of the differences for each alternative 
(3 + 11 + 21 + 17 + 52 = 104). 

5. Next it is necessary to select pairs of adjectives, phrases or statements (these 
can be mixed) such that two of equal preference value but different discriminative 
value are selected. These pairs are then grouped in tetrads (two of the elements have 
high preference value and two of the elements have low preference value). 

6. The items are then tried out on new groups of officers and the preference values 
and discriminative values redetermined. This step is necessary for assurance that 
the grouping of items into pairs and tetrads does not do anything to change their 


values from those originally determined. 


Sisson claims that the forced-choice technique calls for objective 
ratings, minimizes subjective judgments, reduces a rater’s ability 
to produce any desired or predetermined outcome, diminishes the 
effects of favoritism and personal bias, produces a better distribution 
of ratings, is less subject to influence by the rank of the officer being 
rated, and produces more valid ratings. 

It may be possible that the forced-choice technique possesses many 
of these advantages, but the evidence which Sisson presents is not 
entirely clear and leaves some of these alleged advantages open to 
doubt. One of the major doubts concerns the alleged better distribu- 
tion of ratings. By a better distribution of ratings one usually means 
a more normal distribution, that is, a less skewed distribution of 
ratings. However, in the figure that Sisson gives, the distribution of 
ratings is not much less skewed than that secured from the older 
graphic army rating scale. This being the case, it may still be an open 
question as to whether the forced-choice technique eliminates en- 
tirely or even lessens to any considerable extent the possibility that 
a rater can manipulate the scale in such a way as to produce any 
desired or predetermined outcome. In fact Sisson points out that 
as soon as it became known that the ratings were to be official rather 
than experimental, a marked shift in the mean (upwards) took place. 

_As described by Sisson, it is evident that the forced-choice tech- 
nique differs from the typical rating form not only in the forced- 
choice element but also in the fact that the items are validated 
against independently secured criterion groups of successful and 
unsuccessful officers. To date it has not been demonstrated by the 
Proponents of the forced-choice technique that any merit it may 
Possess is not due to this validation factor rather than to the forced- 
choice element. We cannot rule out the possibility that the forced- 
choice element is the important factor giving the technique its 
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value, but its proponents have been careless in assuming and in 
suggesting that the increased effectiveness of the technique is due 
entirely to the forced-choice factor. 

We may summarize certain of the basic assumptions in the forced- 
choice system of rating. These assumptions are not necessarily 
unique to the forced-choice system of rating, however. 


1. Any real difference between one officer and another can be described in terms 
of objective, observable items of behavior. 

2. These objective, observable items of behavior differ in the extent to which 
raters tend to use them in describing the people they rate. That is, 
are used frequently and others are used infrequently. 
are not. 


some of them 
Some are popular and some 


3. These objective, observable items of behavior differ from each other in the 
extent to which they can discriminate between good and poor officers. 

4. Pairs of items can be selected in such a way that both items have the same 
preference value and therefore eliminate the possibility of choice of one of them as 
being more acceptable than the other. And at the same 


differ in discriminative or diagnostic value and so offer 
(although hidden) as to the ra 
these items. 


time the two items can 
a real possibility for choice 
tee’s being a better or a poorer officer in terms of 


An important assumption in the forced-choice technique is that 
the relative diagnostic value of the eleme 


nts in a pair is unknown 
and undetectable by the rater. In other words we are going on the 
assumption that a rater can pick out a more acceptable item and 
can distinguish it from a less acceptable item but that he cannot 
pick out a more diagnostic item and distinguish it from a less diag- 
nostic item. This assumption is not entirely well founded, for there 
is abundant evidence that a person can, when he so desires, get 
pretty close to any predetermined or desired score on the Terman- 
Miles Masculinity-Femininity test or on the Strong Vocational 
Interest Test. To do this the person taking the test must be able to 
pick out the diagnostically significant items as well as the popular 
answers or responses. Also in the development of the equal-appear- 
ing-interval type of attitude or meri 


) j t-rating scale, it must be assumed 
that the rater is capable of pickin 


pabl ) g out the really significant items 
and of actually estimating fairly well their diagnostic value. 


Insofar as clues from preferences are concerned, the forced-choice 
technique is useful in eliminating them as a basis for choice. Thus 
if the preference values are related to diagnostic values (in the 
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original group of items), we can eliminate this as a cue. This does not 
mean, however, that the rater still may not have access to other cues 
and may nevertheless be able to secure any predetermined or desired 
result. 

Some of the more ardent proponents of the forced-choice tech- 
nique claim that the method is entirely new and of recent origin. 
This is patently untrue. In the Strong Vocational Interest Test one 
of the sections asks a subject to indicate which three activities he 
most prefers and which three activities he least prefers. This leaves 
four activities to be marked as neutral. In the Terman-Miles Mascu- 
linity-Femininity test certain of the sections present four choices, 
two of which are known to be preferred by men and the other two 
by women. The subject is forced to choose only one of the responses, 
however. In the Allport-Vernon Study of Values a subject is forced 
to choose between different alternatives known to have different 
diagnostic values but the same preference values. These examples 


could be multiplied severalfold. 


THE L.O.M.A. MERIT RATING SCALES 


Despite their frequently demonstrated lack of reliability, validity, 
and objectivity, certain varieties of nonanalytical rating scales 
continue to be found in general use for employee evaluation. The 
reasons for this probably lie in the ease with which supervisors can 
complete such forms, in their acceptability and apparent face 
Validity, and in the illusion that they are easy to design. It is this 
last point, probably, that is responsible for many of the faults of the 
nonanalytical methods of rating. All too often, a rating scale results 
from a conference in which several people sitting around a table 

dream up” a rating form. The characteristics to be rated, their 
Scaling, and the scoring of the items are far removed from the 
realities of the situation in which the ratings are to be made. ' 

In 1944 the Clerical Salary Study Committee of the Life Office 
n took cognizance of this situation and 
Instigated a series of research studies designed to lead to the develop- 
ment of an adequate, reliable, and objective series of merit-rating 
Scales to be used in employee evaluation. These scales have been 
described in an article called “The L.O.M.A. Merit Rating Plan— 


Manual of Directions for Use.” 


Management Associatio 
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In designing the research which led to the L.O.M.A. Merit Rating 
Scales, consideration was given to the following factors: 


1. The traits to be used would be those found on the basis of rese 


sidered most important by the people who were in the best positio; 
line supervisors. 


arch to be con- 
n to know: first 


ever-present danger that raters will base their judgments, in part at least, on obser- 
vations made in other than work situations, and on characteristics which have no 
relationship whatever to the employees’ “on-the-job” effectiveness, 

4. Characteristics on which workers are to be evaluated must have known and 
constant meaning. Unless this is tru 


e, ratings by different Supervisors cannot mean- 
ingfully be compared with each other, 


5. It isa well-established psy: 


ales is a function not only of differences among 


ced by the particular 
set of items employed, their interrelationships to each other, and so forth, When the 


nature of the distribution of measured ability is known raw scores should be scaled 
to that distribution. When such information is not available, an informed guess is 


an improvement over that which is likely to result from a number of uncontrolled 
factors. 


6. Not all characteristics 
by supervisors’ ratings. Gen: 
like are better assayed by o 


important in employee evalu 
eral intelligence, Spelling and a 
bjective tests, They have no p! 


ation are best measured 
rithmetic ability and the 
lace in a rating scale. 


Trait Importance. On the basis of 
literature and an examination of 
Clerical Salary Study Committee 


SYMNAuUkRYDE 


ii. 
+ Is on his toes to help in any emergency. 

+ Tries to lord it over other employees. 

+ Helps others over any difficult situation. 

+ Annoys other people. 

+ Works with others at every opportunity. 

~ Is adept in adapting himself to the needs and wishes of fellow employees. 


Ree ee eee 
SYMPNAURSS 


NESEN] 
ane 


NNN 
Rae 
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Taste 145. The L.O.M.A. Merit Rating Scales 


1. Ability to work with others 9. Efficiency 

2. Adaptability 10. Initiative 

3. Attitude toward company . 11. Interest in work 
4. Common sense 12. Loyalty 

5. Cooperativeness 13. Punctuality 

6. Courtesy 14. Reliability 

7. Dependability 15. Tactfulness 

8. Disposition 16. Trustworthiness 


Taste 146, L.0.M.d. Merit Rating Scale No. 1—Form A* 


- Feels that he owes nothing to anybody. 


+ Is inclined to “tattle.” 
+ Provides just the “spark” that is needed for effective teamwork. 


+ Promotes harmony. 

+ Plans his own activities with every consideration for their effect upon other people. 
» Is unwilling to lend a helping hand. 

+ Works well with any group of employees. 

» Has an excellent group spirit. 


+ Goes his own way regardless. 
| Pays little attention to any time schedule even though the work of others depends upon 


his close adherence to such a schedule. 
Gets mad if things don’t go to suit him. 


Takes much pride in group accomplishment. 
Refuses to sce other fellow's point of view. 


- Belittles the work of others. 


+ Accepts group policy even though it may differ from his own. 


» Hinders the work of fellow employees. 


+ Is discourteous to those with whom he has to work. 
Talks against any group with which he is associated. 


+ Works only for the welfare of the group. 


Is careful not to let any failure on his part to complete work interfere with that of others. 


* From Ferguson, L. W. (Ed.) Manual of Directions for Use—L.O.M.d. Merit Rating 


Scales. New York: Life Office Management Association, 1950. 


Ww 


ere sent to 320 supervisors in 50 companies. Each supervisor was 


asked to indicate: 


l. The 10 most important traits 
2. The 20 next in importance 
3. The 10 least important 

4. The 20 next least important 
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The middle 40 traits were left unchecked. This ‘“ forced-check 
list” procedure was used to preclude the possibility of the super- 
visors’ assigning a large number of traits to a single category, particu- 
larly to the most important category. The frequencies with which 
the traits were assigned to the various categories were reduced to 
percentages, and weighted mean ratings, one for each trait, were 
computed. Weights extending from 5.00 (for traits of greatest im- 
portance) to 1.00 (for traits of least importance) were used. These 
weighted means ranged from 4.63 for “accuracy” to 1.29 for “ domi- 
nance.” The total possible range extended, of course, from 5.00 (for 
a trait that was universally assigned to the “top ten” 
1.00 (for a trait always listed among the bottom ten). 

Trait Variability. Many merit-rating sc 
basis for an 


category) to 


ales fail to provide the 
adequate degree of variation in assigning ratings, and 
many raters are nondiscriminating with respect to the rel 
degrees of proficiency which can be exhi 
ployees. These two facts made it desira 
evidence concerning the extent of varia 


ciated with each trait and concerning the degree to which this 
“inherent” variability would become evident in supervisors’ ratings. 
To secure appropriate data, the 30 most important traits were 
incorporated into rating scales of the type illustrated in Fig. 9 


(Chap. 10). These 30 scales were divided into three subsets, as 
illustrated in Table 147. 


ative 
bited by a group of em- 
ble, it seemed, to secure 
bility “inherently” asso- 


TABLE 147, Traits Important for Evaluating Clerical Performance* 


Ability to work with others Adaptability 


Attitude toward work 
Accuracy Attitude toward company Job skill 
Ambition Common sense Judgment 
Conscientiousness Cooperativeness Knowledge of work 
Disposition Courtesy Loyalty 
Initiative Dependability Productivity 
Interest in work Efficiency Neatness ` 
Perspective Punctuality Thoroughness 
Quality of work Tactfulness Trustworthiness 
Reliability Volume of work Willingness to accept responsibility 
* From Ferguson, L. W. The L.O.M.A. Merit Rating Scales. Person, Psychol., 1950, 3, 
193-216. 


The directions for making the ratin 
closely those presented in Table 139 (Ch 
directions the ratings w 


gs on these scales followed 
ap. 10). In accord with these 
ere completed by 112 supervisors using 
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Set 1, 105 supervisors using Set 2, and 109 supervisors using Set 3. 
The distributions of ratings secured are presented in Table 148. The 


Tase 148. Recapitulation of the Distributions of 32,600 Ratings on 30 Traits* 


] 
| Cumulative 


Rating of performance Number | Per cent 
i percentage 
o | A akeri 
9. Distinctly superior. 4621 14.17 | 14.17 
8. Considerably above 7108 21.81 | 35.98 
7. Moderately above av 6607 20.27 | 56.25 
6. Slightly above average... 4816 14.77 71.02 
Bis AO Eain R ath teh Se | 6278 | 19.26 90.28 
4. Slightly below avera TP, 1927 | 5.91 96.19 
3. Moderately below av el 746 | 2.28 | 98.48 
2. Considerably below av | 365 1.12 | 99.60 
1. Distinctly inferior.. s- ss s ce sa iso è 132 | .40 | 100.00 


Doiras rbn rripa i | 32,600 | 100.00 | 


* From Ferguson, L. W. The L.O.M.A, Merit Rating Scales. Person. Psychol., 1950, 3, 193- 
216. 


majority of the ratings fall into the upper third (that is, the more 
favorable third) of each scale. Approximately 56 per cent of the 
ratings (total for all traits) were allocated to positions 7, 8, and 9 
(moderately above average, considerably above average, and dis- 
tinctly superior), and 90 per cent were average or better. In contrast 
only 4 per cent of the ratings were allocated to positions 1, 2, and 3 
(moderately below average, considerably below average, or dis- 


tinctly inferior). 
o eliminate a large measure of the skewness evident in Table 


148, the ratings were reallocated on a five-step scale. This realloca- 
tion involved the reassignment given in Table 149. 


Tase 149. Reallocation of Ratings Presented in Table 148* 


Original category | New category | Per cent 


9 5 14.17 

8 4 21.81 

7,6 3 35.04 

5 2 19.26 
4321 1 9.72 


* From Ferguson, L. W. The L.O.M.A. Merit Rating Scales. Person. Psychol., 1950, 3, 193- 
16. pls W. 
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Skewness was not entirely eliminated, but the reallocated ratings 
approximate much more closely those of a normal distribution than 
did those on the nine-step scales. Following this reallocation of 
ratings, standard deviations were computed for each of the 30 traits. 


Tague 150. Distribution of Means (by Trait) of Equal-appearing-interval Statements* 


. Value of mean 
3 No. of = TT 
Tais raters| PE E a a 
2.49 | 3.49 | 4.49 | 5.49 | 6.49 | 7.49 | 9.00 

Ability to work with others. 75 3) 22| H 2| 29] 21 Z| 100 
Adaptability ...| 80 2| 24| 22 2| 26) 21 3 100 
Attitude toward company....... 81 7} 16| 23 8| 24| 17 5 | 100 
Common sense..........00000+5 80 4| 20| 25 $i 25 18 5| 100 
Cooperativeness. sal 2B Ti 19) 17] 10) 20) 20 7| 100 
Coene aa dn liaa 78 8| 14| 24| 6| 20| 24] 4] 100 
Dependability... -000000000 79 6| 19| 18| 9| 2| 233| 4| 100 
Disposition ... 78 8| 18] 20] 7] 21| 22) 44] 100 
Efficiency... Bri fae 3 24 20 6| 21 22 4 100 
Initiative..... ol ga 5| D| 24 A} 21| 22 5 100 
Interest in work. we) 76 4| 19| 24 6j 21] 20 5 100 
Loyalty...... ot et 4| 24] 19] 7] 21] 22] z| 100 
Punctuality 82 5] 18] 25 6) 25) 18 3 100 
Reliability... 82 6| 15| 26 31 28 17 5 100 
Tactfulness 82 6| 18| 26 2i 22i 22: 4 100 
Trustworthiness 79 Bc) J8] 235 4| 20] 27 1 100 
AGE sissa isean se annad 83 | 307 | 359 | 85 | 366 | 336 | 64 | 1,600 


* From Ferguson, L. W. The L.O.M.A. Merit Rating Scales. Person. Psychol., 1950, 3, 193- 
216. 


Descriptive Statements. The L.O.M.A. Clerical Salary Study 
Committee decided that the 30 traits which had been found most 
important and, of these, the 16 which had produced the greatest 
variability in ratings (that is, which had the largest standard devia- 
tions) would provide the basis for the preparation of items for the 
final scales. Therefore for each of these 16 traits, 100 statements 
descriptive of behavioral elements to be rated were prepared. For 
each trait half of the statements were so worded that affirmation 
of their applicability to an employee would indicate the presence 
of the trait to an average or greater than average degree. The other 
50 statements in each set were so worded that their affirmation would 
indicate the presence of the trait to an average or less than average 
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degree. Statements were prepared to be as unambiguous as possible 
and they were written in language that supervisors ordinarily use in 
evaluating employees. 

The next step was to have the statements in each of the 16 sets 
scaled by the equal-appearing-interval technique. From 75 to 83 
supervisors scaled each set of 100 statements. A nine-point scaling 
system was employed. The technique used here differed from that 
ordinarily employed in that it restricted the number of statements 
that could be placed in each of the nine categories. This restriction 
was imposed so that each supervisor would have to give careful 
consideration to a statement before deciding that it had high diag- 
nostic value. Frequency distributions were prepared, and the mean 
and standard deviation of the ratings for each statement were 
computed. Table 150 presents the distribution of means for the 16 
traits, and Table 151 presents the 16 distributions of standard 


deviations. 


Taste 151. Distribution of Standard Deviations (by Trait) of Equal-appearing-interval 


Statements* 
Value of standard deviation 
. No. of Total 
Tidit racers | .25-| .55-| .70-| -85-|1.00-[1 .15-]1.30-1.45- ned 
“54. | .69 | .84 | .99 | 1.14] 1.29) 1.44] 1.74 
"Ability to work with others | 75 3| | 27] 28] 19| 9] -| 100 
Adaptability. |... 80 1] 13| 32| 33| 17| 3| 1| 10 
Attitude toward company | 81 11 251 31 22 7 2 2h 100 
Common sense. ....-. +--+ so | ..| 2| mi 32 40) wo] 3 100 
Cooperativeness. . 83 3| a| 2| 36| 16] 8| 4] «-| 100 
OE Cem nunnan saanen 78 12| 26| 31| 18| 10| 1] 2] 100 
Dependability..........-. so | ..| a a se] i] we] ee] sx) ae 
\sposition. . wl 28 3| s| 23] 33] 21| 132| 2) 2] g% 
Efficiency a7 | 2) ili toil ao) a2] ts) a 2) aoe 
Initiative 79 3| as] 27] 32] 18] 4] 1] 100 
Interest in work saa] 76 Y: 4| 15| 37) 28 M 2 100 
OValtys vasa sc 81 il +l Worl 26] 32) E] 4 100 
Punctuality 82 il 2) 6| 25| 36| 18} 7 S 100 
Reliability, . “lB | cel) | te] Bil Bey aay 3 1} 100 
Tactfulness. . .. 82 il aE g 3S) 0 S 100 
trustworthiness 79 4 | ael a |) 258] 20] Ea 8 
ital r |ia | 82 | 264 | 483 | 448 | 231 | 62] 16 1,600 


A. Merit Rating Scales. Person. Psychol., 1950, 3, 193- 


* 


ME From Ferguson, L. W. The L.O.M. 
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The selection of statements for the final forms was confined largely 
to those whose mean values were 6.5 or higher and 3.5 or lower and 
to those whose standard deviations did not exceed 1.30. Ninety- 
four and one-half per cent of 416 statements had mean values of 
6.5 or over, 93.0 per cent of 416 additional statements had values 
under 3.5, and 93.3 per cent of all 832 statements had standard 
deviations less than 1.30. 

No independent criterion population was available for item 
analysis purposes, so the L.O.M.A. Clerical Salary Study Committee 
established an arbitrary scoring system. First, the mean value of 
each statement was rounded to the nearest whole number. For the 
“positive” statements, that is, those with means of 9, 8, or 7, the 
weights given in Table 152 were established. Conversely, for the 
items at the opposite extreme, that is, those with means of 1, 2, or 
3, the weights given in Table 153 were assigned. In the interest of 


Taste 152. Scoring Weights for “ Positive” Statements* 
| Mean value 

Response | = 
| Tib | 9 

ee = ae 
Always or completely characteristic............,. | 7 | 8 | 9 
Usually or almost characteristic................. | 6 7 | 8 
Sometimes or moderately characteristic | 5 | 67 
Seldom or slightly characteristic. . | 4 | $| 6 
Never or not at all characteristic 3 | 4 | 5 


* From Ferguson, L. 


W. The L.O.M.A. Merit Rating Scales. P 
216, 


erson. Psychol., 1950, 3, 193- 


Taste 153. Scoring Weights for “Negative” Statements* 


| Mean value 


Response Se 
| | | 

Man ae 3 

Sa SOEN 
Always or completely characteristic, ji es a E 
Usually or almost characteristic........... 2 3 4 
Sometimes or moderately characteristi 3 4 5 
Seldom or slightly characteristic. .... 4/8 6 
Never or not at all characteristic Sig | 7 

| 


* From Ferguson, L. W. The L.O.M.A. Merit Rating Scales. Person. Psychol., 1950, 3, 193 
216. 
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making one uniform scoring key for all 16 scales, several slight 


departures from this system were allowed. 

Standardization. To obtain data for standardization purposes 
the L.O.M.A. Clerical Salary Study Committee secured cep pa 
an 17,000 employees. To reduce bias in the 


tal ratings for more th 
urpose, the following instructions 


selection of employees for this p 
were sent to the cooperating companies: 


Some person other than the supervisor who is to do the rating should select 
the employees who are to be rated. This will make it impossible for a supervisor 
to select his favorite (or vice-versa) as a basis for the ratings. To insure the greatest 
degree of comparability from company to company, will you please include em- 
ployees representing all levels of performance. In terms of a nine-step scale of 
overall ability, we should like the entire employee group for your company to 
consist of approximately 

4% classified as distinctly superior 
7% classified as considerably above average 

12% classified as moderately above average 

17% classified as slightly above average 

20% classified as average 

17% classified as slightly below average 

12% classified as moderately below average 

7% classified as considerably below average 
4% classified as distinctly inferior 

If the norms which we intend to prepare 
essential that we include employees differing widely in, and repres 
of performance. i 


Ratings were secured for 500 or more employees for each of th 
s ; A 2 : z å 
cales. In spite of the instructions issued there was consider 


are to serve their purpose, it is absolutely 
enting all, levels 


e 16 
able 


Ty 5 
ARLE 154. Interpretation of Standard Scores on L.O.M.A. Merit Rating Scales* 


| ] 
Score % | Rating Interpretation 
+ A Distinctly superior 
7 B+ | Considerably above average 


17 C+ | Slightly above average 

20 C Average 

17 C— | Slightly below average 

12 D+ | Moderately below average 
f D— | Considerably below average 


| 

| 

12 B— | Moderately above average 
| + . 

4 E Distinctly inferior 


ue From Ferguson, L. W. The L.O.M.A. Merit Rating Scales. Person. Psychol., 1950, 3, 193- 
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skewness in the raw-score distributions. Therefore these distribu- 
tions were normalized before final standards were set up. The 
interpretation applying to the standard scores is given in Table 154. 


THE APPRAISAL OF FIELD TRAINING REPRESENTATIVES 


One of the pressing problems in any organization is that of making 
adequate and fair appraisals of the work done by its employees. 
These appraisals are needed as an aid in determining salaries and 
wages and in deciding upon promotions, demotions, transfers, and 
other personnel changes. 

The Metropolitan Life Insurance Company uses five analytical 
rating forms in its field organization. Two of these forms are for 
agents, two are for assistant managers, and one is for traveling field 
training division representatives. Our task now is to describe the 
development of one of these forms and to show how it differs, as do 
the L.O.M.A. scales we have just described, from the nonanalytical 
methods we discussed in Chap. 10. We shall concern ourselves 
with the form developed for appraising the work performance 


of the Metropolitan’s staff of traveling field training division 
representatives, 


Step 1. The first ste 
of collecting a large variety of st 


business.” ‘Uses poor methods of Prospecting and selling” and 
“Helps management determine the real needs for training.” This 
ur Sa formal in- 
raining division, suggestions offered by field 
sed knowl- 
ng division 
nvestigator. 
5 it was sub- 
iew, and, as a 
ure more ade- 
ng process, the 


vestigation in the field t 
training division re 
edge of the duties 
representatives, and 
When the complete 
mitted to field traini 
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statements were segregated into two groups of 100 statements each 
and were designated experimental appraisal forms A and B. 

Step 2. Next, field training supervisors (these are first-line super- 
visors) were asked to complete these trial appraisal forms for all field 
training instructors. 

Step 3. Field training division supervisors (these are second-line 
supervisors) were asked to supply criterion data consisting of degree 
of acquaintance ratings, numerical performance ratings, and paired- 
comparisons performance ratings. These ratings were made on forms 
similar to those presented in Figs. 9 and 11 (Chap. 10) and were 
secured for 64 field training instructors. 

Step 4. Two criterion groups of 20 field training instructors were 
selected. One group consisted of the 20 field training instructors with 
the highest composite criterion ratings, and the other group con- 
sisted of the 20 field training instructors with the lowest composite 
criterion ratings. These two groups were compared with each other 
so that the responses characteristic of each group could be identified. 
To illustrate, consider the statement “Bases suggested training 
Program on agent’s individual needs.” This statement was said. to 
be always or completely characteristic of 45 per cent of the high- 
Scoring field training instructors and of only 20 per cent of the 
low-scoring field training instructors. It was said to be usually or 
almost characteristic of 50 per cent of the high-scoring field training 
instructors and of 55 per cent of the low-scoring field training in- 
Structors. It was said to be sometimes or moderately characteristic 
of 5 per cent of the high-scoring field training instructors and of 25 
Per cent of the low-scoring field training instructors. We see that 
the response “always or completely characteristic” applies to a 
larger percentage of high-scoring than of low-scoring field training 
Instructors but that the responses “usually or almost characteristic 
and “sometimes or moderately characteristic” apply to a larger 
Percentage of low-scoring than of high-scoring field training in- 
Structors, The two remaining responses, “seldom or slightly charac- 
teristic” and “never or not at all characteristic,” are nondifferentiat- 
Ing, since they did not apply to the instructors in either criterion 
Sroup. When these comparisons were complete, the 78 items showing 
the greatest degree of differentiating power were selected to com- 
Prise the final appraisal form. 


Step 5. Field training supervisors were asked to reappraise all 
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field training instructors on the 78 items retained for use in the final 
form. This reappraisal was requested because a review of the field 
training supervisors’ responses to the statements in the preliminary 
forms indicated a considerable degree of overrating. A series of 
meetings was held to caution field training supervisors against 
overrating, and in these meetings they proceeded with their re- 
appraisals. When these reappraisals were complete, each field train- 
ing supervisor was asked to cite instances of behavior on the part of 
each field training instructor that would support the answers he had 
given. If a field training supervisor could not do this, he was asked 
to lower his rating. 

Step 6. When the work of all field training instructors had been 
reappraised, the appraisal forms were scored on the basis of a 
formula which assigned a weight of 5 to the most favorable response, 
a weight of 4 to the next most favorable response, and so on, down 
to a weight of 1 for the least favorable response to each statement. 
Consequently, on 46 statements the response “always or completely 
characteristic” carried a weight of 5, and on 30 statements the 


response “never or not at all characteristic” carried a weight of 


5. Since two statements were ambiguous in nature, th 


ey were not 
used. 


When a distribution of the scores secured on this basis became 
available, the diagnostic values of the 78 statements in the form were 
recomputed. To ensure the highest possible degree of internal 
consistency, 19 field training instructors who secured scores in the 
upper 27 per cent of the distribution Just described and who also 
secured scores in the upper 27 per cent of the distribution of criterion 
scores described in Step 4 were chosen to represent successful field 
training instructors. And 19 field training instructors who secured 
scores in the lowest 27 per cent of the distributi 
been described and who also secured scores in the 
of the distribution of criterion scores describe 
selected to represent unsuccessful field training 
the comparison procedure illustrated in Ste 
training supervisors reported that the statement “‘ Demonstrates 
effectively good debit management” is always or completely charac- 
teristic of 58 per cent of the successful field training instructors but 
of none of the unsuccessful field training instructors. This gives a 
difference of 58 per cent in favor of successful field training instruc- 


on which has just 
lowest 27 per cent 
d in Step 4 were 
instructors. Then 
p 5 was repeated. Field 
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tors. Field training supervisors reported that this statement is 
usually or almost characteristic of 42 per cent of the successful field 
training instructors and of 63 per cent of the unsuccessful field 
training instructors. This gives a difference of 21 per cent in favor 
of unsuccessful field training instructors. The response “sometimes 
or moderately characteristic” was said to apply to 32 per cent of the 
successful field training instructors and to none of the unsuccessful 
field training instructors. And the response “seldom or slightly 
characteristic” was said to apply to.5 per cent of the successful field 
training instructors and to none of the unsuccessful field training 
instructors. The response “never or not at all characteristic” applied 
to neither group. Final scoring weights were determined upon the 
basis of these differences. For example, when a given response, say 

usually or almost characteristic,” was found to yield a difference 
of 35 per cent in favor of successful field training instructors, it was 
assigned a scoring weight of 35. If it was found to yield a difference 
of, say, 42 per cent in favor of unsuccessful field training instructors, 
1t was assigned a scoring weight of —42. When it was found that a 
Strict application of this procedure would yield a logically incon- 
Sistent series of scoring weights, certain of the responses were com- 
bined before the weights were assigned. In one case, a strictly literal 
application of the rule for determining weights gave the series in 


Table 155, 


Taste 155. An Illogical Series of “ Scoring Weights” 


Always or completely characteristic. . 58 
Usually or almost characteristic. -21 
Sometimes or moderately charact —32 
Seldom or slightly characteristic......-..+++++> -5 

0 


Never or not at all characteri 


This series is illogical. Fewer points are subtracted from the 
accumulating score for the response “seldom or slightly charac- 
teristic” than for the response “sometimes or moderately charac- 
teristic.” And fewer points still (in fact, none at all) are subtracted 
for the response ‘“‘never or not at all characteristic.” In this instance, 
therefore, it seemed proper to combine the last three responses and 
to make no differentiation between them. This means that only 


three effective discriminations are provided by the statement in 
eights become 58 for the response “always 


question. The scoring w 
—21 for the response “usually or 


or BIB AES: 
r completely characteristic,’ 
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almost characteristic,” and —37 for the responses “sometimes or 
moderately characteristic,” “seldom or slightly characteristic,” or 
“never or not at all characteristic.” This gives a logically consistent 
series of scoring weights. 

Step 7. Reliability was determined by correlating the scores on 
the odd-numbered statements with those on the even-numbered 
statements and by stepping this correlation up with the Spearman- 
Brown Prophecy Formula. The figure arrived at in this way is .97. 
This value compares favorably with that pertaining to the 
great majority of appraisal forms used by business and industrial 
organizations. 

Step 8. The validity of the scores on the field training instructor’s 
appraisal form was determined by correlating them with the criterion 
ratings discussed in Step 4. This correlation is .60. This value is 
spurious, however, because it is based upon the data used in the 
derivation of the scoring weights. On the other hand, the correlation 
is low because the criterion scores do not possess perfect reliability. 

Uses. This appraisal form has been found useful in several ways. 
It has servéd as a criterion for the validation of a test for selecting 
new field training instructors, and it has served as a basis for the 
determination of salaries. It has served as a basis for making promo- 

_ tions, and it has served as a basis for deciding upon training assign- 
ments. In connection with this last purpose, it has indicated which 
field training instructors have needed additional training, and it 


has indicated in what areas of their performance this additional 
training was needed. 


I2 


PROJECTIVE TECHNIQUES: A PERCEPTUAL 


APPROACH 


All the personality-measuring techniques we have discussed have 
been based upon the theory that responses to test items must be 
controlled. They must be controlled, it is thought, so that we can 
determine what they mean. Imagine our problem, for example, if 
we did not provide a limited number of responses to the items in the 
Strong Vocational Interest Test or in the Bernreuter Personality 
Inventory. As it is, there still are an enormous varjety of responses 
available. Let us stop and figure a moment. To each item on the 
Strong Vocational Interest Test a subject can respond with an L 
an I, or a D. So if we consider just two items, there are the following 


nine possible responses: 


Item 1 Item 2 

L L 
L I 

L D 
I L 
I th 

I D 
D L 
D I 

D D 


When we add a third item, the number of possible responses in- 
creases to 27. But there are 400 items in Dr. Strong’s test, so the 
total number of possible responses is 3499, that is, 3 raised to the 
400th power. We do not need to figure this out to realize that we 
have an enormous number on our hands. 


Of course, not all tests provide for as many 
335 


possible responses as - 
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the Strong Vocational Interest Test. But they all provide for a 
goodly number. This being the case, most of our methods of analysis 
are designed to classify and organize these responses into general 
categories amenable to statistical treatment. 

Some psychologists object to this procedure. They claim that we 
lose a great deal of valuable information by ignoring the vast num- 


a number of techniques 


in which responses are completely 
uncontrolled. They have prepared these techni 


CONTENT OF TEST 


The Rorschach test is an 
Hermann Rorschach, a Swiss 
we see in ink blots, and how 


our personality. If you see a beautiful tree bec 
if I see a skeleton because of the sh 


ped by 


psychiatrist, on the theory that what 


There are 10 blots in the Rorschach test. F 
do not. These blots are reproduced 
inches, and these cards are shown one at a time to a subject. We 
ask the subject to tell us what he sees in each blot, we record his 
responses, we take notes on his behavior, and later we ask him to 
explain the features of the blot which formed the bases for the 
pictures they suggested to him. 

There is very little extant about the way in w 


c , hich Rorschach con- 
structed or selected the particular ink blots 


~- 


7 


w 
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have little to say, therefore, about the construction of the Rorschach 
test. In fact, we have nothing further to add on this point. We shall 
try instead to describe what the test is supposed to do, how it is 
supposed to do these things, and how well it does them. 


PURPOSE OF TEST 


First, what is the Rorschach test supposed to do? Klopfer and 
Kelley, the authorities we shall follow in this chapter, say that it 
gives us 


a configurational picture which reveals the interplay between various major 
and emotional factors in the personality of the subject. From this pic- 


intellectu 
ture the following structural aspects of personality are to be deduced: 


1. The degree and mode of control with which a subject tries to regulate his 
experiences and actions 

2. The responsiveness of his emotional energies to stimulations from outside and 
promptings from within 

3. His mental approach to given problems and situations 

4, His creative or imaginative capacities, and the use he makes of them 

5. A general estimate of his intellectual level and the major qualitative features 
of his thinking 

6. A general estimate of the degree of security or anxiety, of balance in general, 
and specific unbalances 

7. The relative degree of maturity in the total personality development 

[Klopfer and Kelley continue] This list does not represent a complete account 
ach method. It simply enumerates 


of the personality aspects revealed by the Rorsc 
the major structural elements in the configurational picture which the Rorschach 


material reflects. 
A SAMPLE PROTOCOL 


Complete or not, this list represents a big order. To understand 
in what ways a Rorschach record can provide this information, we 
had best start with a complete protocol. Then we can base our dis- 
cussion on what we observe in this protocol. The protocol we shall 
give lists the reactions of Carmen, one of the subjects discussed by 
Merrill in her volume on Problems of Child Delinquency. Carmen 
gave a total of 30 responses, an average of 3 per card. It took her 
approximately eighteen minutes to give them. They are given 1n 
Table 156. 

This protocol constitutes a record of Carmen’s performance, the 
first phase of the Rorschach method. The protocol as it now stands 
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Tase 156. Carmen’s Protocol* 
Response | Card | Position kese Performance 
1 1 rsut 5 Cliff or precipice with two people on it 
2 i rsu 20 Cat 
3 1 usd 45 Skeleton of a prairie cow 
60 Interval 1 
4 2 rsu 4 Insides of a person 
5 2 usd 15 |Lamp—and a flame 
35 Interval 2 
6 3 rsu 5 Skeleton—body and arms. Red blotches are the blood, 
middle red is the heart 
7 3 usd 30 | Bogey man 
8 3 usd 35 Profiles, man with mustache 
9 3 usd 45 Branches, dried, no leaves 
60 Interval 3 
10 4 rsu 5 Bear rug, eyes, queer kind of bear 
11 4 rsu 10 | Pair of boots 
12 4 rsu 18 | Ears of a dog 
45 Interval 4 
13 5 rsu 3 Bat 
14 5 Tsu 10 | Snail. Head here, and horns stick up here. 
27 Interval 5 
15 6 rsu 1 Fish 
16 6 Tsu 9 | Special kind of butterfly 
17 6 rsu 33 Bedpost 
35 Interval 6 ; 
18 7 Tsu 3 |Two ladies gossiping together, Funny hair-do’s. Both 
pointing toward places, 
19 7 usd 20 Couple dancers, old fashioned kind 
29 Interval 7 
20 8 usd 11 | Sweet peas 
21 8 tor 22 Hyenas with that shape 
22 8 usd 43 Back bones 
23 8 rsu 50 Face of a man, long beard, long hat on, like in funny 
books 
61 Interval 8 
24 9 rsu 4 Sunset. Sun coming up over here; forest, road here, 
coming right down through here 
25 9 usd 30 Big fat lady, two arms and legs; pink 
60 Interval 9 
26 10 rsu 2 Bunch of bugs and insects 
A 7 ma ao S floating down from heaven on a parachute 
29 10 usd 33 | Two people holdin hands over cli ki j 
30 10 usd 56 | Two devils climbing up a tree chat “ne 
60 Interval 10 


* From Merrill, M. A. Problem 


s of Child Delinquency. Boston: Houghton Mifin Company, 


1947. 


jt rsu = right side up; usd 


= upside down; tor 


= top on right. 
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undoubtedly has very little meaning, even if you bothered to read it. 
So let us go over it again and see what we should look for. First, we 
see that Carmen looked at Card 1 for five seconds and said she saw a 
cliff or precipice with two people on it. She looked at it for twenty 
seconds more and said she saw a cat. Then she turned the card 
upside down, looked at it this way for forty-five seconds, and said she 
saw the skeleton of a prairie cow. Now she laid Card 1 aside, taking 
sixty seconds to do this, and picked up Card 2. She looked at Card 
2 for four seconds and said she saw the insides of a person. She 
turned the card upside down, looked at it this way for fifteen seconds, 
and said she saw a lamp and a flame. Then she laid Card 2 on the 
table, picked up Card 3, and proceeded as indicated in the protocol. 
With this record before us, we can do two things: we can determine 
the content of Carmen’s responses, and we can analyze the time 
relationships involved. Let us begin with the content. 

Content. How many things did Carmen see? In Card 1 she saw 
two people, in Card 3 she saw a bogeyman, in Card 7 she saw two 
ladies gossiping together and two dancers, in Card 9 she saw a big 
fat lady, and in Card 10 she saw an angel, two people holding hands, 
and two devils climbing up a tree. This gives a total of eight re- 
sponses involving human beings or the human form. Carmen also 
saw many animal forms. For example, in Card 1 she saw a cat, in 
Card 5 she saw a bat, in Card 6 she saw a fish and a butterfly, in 
Card 8 she saw hyenas, and in Card 10 she saw a bunch of bugs and 
insects and some blue crabs. Thus eight of her responses had an 
animal content. A complete review of Carmen’s responses gives us 


the tabulation in Table 157. 


Tase 157. The Content of Carmen’s Responses 


Content No. Content No. 

Plr I OPENA asim are eee 8 | Human anatomy..........+5005555 0+ 1 
3 | Animal anatomy... esseere 1 

7 | Objects : 3 

2 | Objects created from parts of animals} 1 

4 Total; ossad einne EE 30 


Now if you were to take the Rorschach test or if I were to take it, 
we would probably see many things that Carmen did not see, and 
some of the things she saw we would not see. Thus the content of 
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our responses might be quite different from the content of Carmen's 
responses. And in these differences lie, say the Rorschach experts, 
important clues to the nature of Carmen’s, of your, and of my 
personalities. 

One of the major things we want to note about the content of 
Carmen’s responses is the percentage involving animals. We want 
to know this because if it is large, it indicates, say the Rorschach 
experts, a stereotyped or a narrow range of interests. It indicates 
this, they say, because the choice of animal concepts is obvious. 
Therefore, a person who cannot choose a reasonable proportion of 
concepts not involving animals is merely reacting to the obvious. 
Carmen gave a total of 11 animal responses. This is just 37 per cent 
of the total number of responses which she gave, so we conclude that 
there is no evidence that Carmen’s interests are stereotyped or 
narrow. 

We were able to classify Carmen’s responses into nine content 
categories. These include those most important in any Rorschach 


analysis, but they by no means exhaust the possibilities. Bell has 
compiled a list of 34 frequently used content c 


e 4 ny ust ategories, and we give 
this list, for general information, in Table 158 


Taste 158. Content Categories of Rorschach 


Responses* 

Animal Emblem Sex 
Animal detail Fire Statues 
Abstract Fountains Structures 
Alphabet Geographical Symbolic 
Anatomy Ice Volcanic 
Art Mask fate 

: a Water form: 
Architecture Mountains Imaginary ý 
Human Blood Cave i 
Human detail Botanical forms Color 
Clouds, mist, fog Nature Numbers 
Plants Objects Scenery ; 


* From Bell, J. E. Projective Techniques, New York: Longmans, Green & Co. Inc., 1948. 
w i E 


Time. The second thing we said we 


protocol was to analyze the time relationships involved TI col 
itself contains the time it took Carmen to make each. he poetoes 
the time elapsing between cards. From these ces : response anc 
Carmen’s total response time, average response time ea compute 
time, and average reaction time. This last ODEs Abc eee 


could do with Carmen’s 
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time required for the first response to each card. We should also 
compute, say the Rorschach experts, the average reaction time for 
the cards with color and compare this with the average reaction 
time for the cards without color. 

Carmen’s total response time was 617 seconds, or 10 minutes 17 
seconds. This makes her average response time not quite 21 seconds. 
Her total reaction time was +72 seconds, or 7 minutes 52 seconds. 
This makes her average response time a trifle over 47 seconds. 
Finally, Carmen’s average response time for the cards without color 
was 45 seconds while that for the cards with color was +9 seconds. 
These data would lead the Rorschach expert to conclude that 
Carmen has quick reaction and response time (less than one-half 
minute per response) and no color shock (that is, no practical 
difference in average response time for the colored and noncolored 
blots). 

Response and reaction times aver 
minutes are considered by Rorschach experts to be symptomatic of 
severe pathology or of extreme inhibitions. And large differences in 
the average response and reaction times for the colored and un- 
colored blots are supposed to be symptomatic of severe emotional 
disturbance. We see that Carmen, in her response and reaction time, 


gives no evidence of such psychological malfunction. 


aging over one and one-half 


THE INQUIRY 


We must now mention the inquiry, the second, and a very impor- 


tant, phase of the Rorschach method. In the performance proper the 
task of the examiner is to record whatever the subject says. But in 
the inquiry, which follows the performance, the task of the examiner 
is to ask the subject what factors contributed to the formation of his 
concepts. 

Location. One of the things which the examiner tries to find out 
is whether the concept was determined by the whole of the blot or 


and if by the latter, whether it was determined by a 


by some detail 
ermined by a small 


large detail or by a small detail. And if it was det b 
detail, was this a usual detail or an unusual detail? And if 1t was an 


unusual detail, was it an edge detail or an inside detail? In short, the 
examiner tries to determine the location of the concept on, or 


within, the blot area. 
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Carmen’s responses were located as indicated in Table 159. She 
used the whole blot in 12 of her responses, normal details in 14 of 
them, small normal details in 2 of them, and she utilized the white 
space in the remaining 2. Examples of Carmen’s use of the whole blot 
are in her responses of “cat” and “the insides of a person” (Card 


Taste 159. The Location of Carmen’s Responses 
Whole Piit sessi sacs eaten 12 


Normal detail 14 
Small normal detail... 2 
White space zd 

T airar oa ao aas are sts 30 


2), “skeleton,” “bogey man” (Card 3), “bear rug” (Card 4), and 
so on. Examples of her use of normal detail can be found in her 
responses of “ cliff or precipice with two people on it” (Card 1), “man 
with mustache” (Card 3), “pair of boots” and “ears of a dog” 
(Card 4), and so on. 

When we contrast this distribution with that which might result 
from your responses to the Rorschach test (or mine), we find, say 
the Rorschach experts, our third set of clues to the inner structure 
of Carmen’s, your, and my personalities. The normal subject will 
give, according to Klopfer and Kelley, about 30 responses. Of these, 
20 will be based upon normal details, 7 will be based upon wholes, 
and the remaining 3 will be based upon small details. 


Carmen gave, we saw, 12 concepts in which the entire blot was 


utilized. This is somewhat more than average and indicates a tend- 
ency, say the Rorschachers, toward ‘ 


forms of thinking and the higher forms 
who would base only a small number 
blot would be considered, in contrast wi 
for higher mental activity. Klopfer an 
gent subject will . . . locate half of 


areas of the cards - +. and will place 5 to 15 per cent of his responses 
in other portions which are not so obvious,” According to this, 
Carmen on be we intelligent subject, for 46 per cent of her re- 
sponses make use of normal (obvious) details 
5 and 

use of small (but noticeable) details. peas 

When a subject uses the whole blot, he may do this in several 
ways. He may respond with a concept showing a su erior and 
elaborate organization; he may give a response making good n 

3 


‘an emphasis on the abstract 
of mental activity.” A subject 
of his concepts on the whole 
th Carmen, to lack a capacity ' 
d Kelley say that “ the intelli- 
his responses in the obvious 
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not superior, use of the entire blot; he may give a response suggested 
by exterior outline, but one involving no extensive elaboration; he 
may utilize only the crudest features of the blot; or he may give 
wholly arbitrary responses. The subject can also give responses which 
tend only toward the use of the whole blot. For example, neglecting 
some detail either inadvertently or by design, he may not use the 
entire blot but just most of it. He may rather loosely or poorly 
organize the various portions of the blot into a combined whole. He 
may react to only one of the symmetrical halves of the blot, and 


finally he may select some detail and use this as a basis for building 


up a response for the whole blot. 
When a subject bases too many of his concepts on details in 


contrast to wholes, Klopfer and Kelley claim that this indicates, if 
supported by other signs, a “trend toward mental escape from 
reality.” Further, they say that an emphasis on tiny details indi- 
cates the presence of obsessional traits, that an emphasis on inside 
details indicates preoccupation on the part of the subject with his 
own inner life, and that an emphasis on outside details indicates an 
attempt to keep from the troubled stirrings of the inner self. 

Carmen bases two of her concepts on the white spaces. This repre- 
sents a normal frequency. When a large number of concepts are 
based on the white spaces, it is supposed to indicate the presence of 
oppositional tendencies. This conclusion is reached because the 
use of the white spaces necessitates the reversal of the usual figure- 
ground relations. A person who has oppositional tendencies may 
show these in self-destructive tendencies or in negativistic behavior 
in general. 

The /ocation of his responses will reveal, say the Rorschach experts, 
the subject’s mental approach. They will show whether he has a 
preference for sweeping generalities, a tendency to get lost in un- 
related details, the compulsive habits of a perfectionist, or whether 
he exhibits the arbitrary digressions of an undisciplined mind. We 
show in Table 160 a considerable number of the variations which, 
according to Bell, can take place in the location of a subject’s 
concepts. 


When a subject bases a large majority of his conc 
blot area, this is taken as evidence that he has a tendency for making 


sweeping generalities. When he bases a large number of his concepts 
on details, particularly on small details, and does not integrate 


epts on the whole 
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these details into a whole blot concept, it is thought that he is 
exhibiting a tendency to get lost in unrelated details. When a subject 
is meticulous in pointing out that the whole blot could be a. . 
except for some detail, or that a normal detail could bea, except 
for some lesser detail, it is thought that he is exhibiting the com- 
pulsive habits of a perfectionist. And finally, when a subject gives 
responses that are not well integrated with the blot area or that do 
not make efficient use of the various concept determinants, it is 
taken as evidence that he has an undisciplined mind. 


Taste 160. Location Categories in Rorschach Responses* 


1, Whole blot used for interpretation 
2. Cut-off whole (a small portion of whole omitted) 
3. Confabulatory response: a meaning assigned to the whole on the basis of 
of a detail 
4. Whole response in which several details are 
5. Detail response with a tendency tow: 
6. A normal detail of the blot 
7. Small normal detail 
8. 
9 
10. 


an interpretation 


successfully combined to produce a whole 
ard a whole interpretation 


- Small infrequently used detail 
. Edge detail 
. Rare small detail 
11. Inside detail 
12. Oligophrenic detail: choice of a sm 
whole body 
13, White space between blot areas chosen for interpretation 
14. Combinations of space details with w 
15. Interpretation of all the white 


all area of a body where an individual normally gives a 


hole, normal detail, or rare details 
area of the card 


* From Bell, J. E. Projective Techniques. New York: Longmans, Green & Co., Inc., 1948. 


In connection with this idea of ment 
the order in which the locations are us 
each blot to give a concept based first 
on a normal detail, then on a small d 
(or in the exact reverse order) is s 


If he does this in all but one, two, or three cards, his approach is 

said to be orderly. But if he fails to use this order in four to seven 

cards, his approach is loose, and if he does not use any order, ot 

uses it in only one or two cards, his approach is said to al confused. 
Determinants. A fourth clue to the Structure of person 

to be found, say the Rorschach experts, 

subjects’s concepts. Did Carmen base her c 


al approach, much is made of 
ed. A subject who proceeds in 
on the whole of the blot, then 
„detail, then on the white space 
aid to proceed in a rigid manner. 


r ality is 
in the determinants of a 
oncepts on the form or on 
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the color of the blot? Or on both equally? Did she make use of the 
shading? And if she did, just how did she make use of it? In perspec- 
tive, surface, or depth? Did she inject movement into her responses? 
And if she did, was it human movement, animal movement, or 


inanimate movement? 
Inquiry revealed that Carmen used the form of the blot in nine of 


her responses, human movement in eight of them, shading in five of 
them, and so on, as indicated in Table 161. 


Tane 161. Determinants of Carmen's Responses 


Determinant Number Determinant Number 
i 
Human movement. 8 Form..... ` 9 
Animal movement. . 1- | Form with color 2 
Inanimate movement. 2 | Color with form 2 
Shading, vista 2 Color only 1 
3 Total. . 30 


Shading, surface i Í 


Examples of Carmen’s imputation of human movement can be 
“bogey man” (Card 3), in her response of 
> (Card 7), and in her response of 
a cliff or bucket or object” (Card 


10). Examples of her use of form can be seen in her response of 
“skeleton of a prairie cow” (Card 3), in her response of “profiles” 
and “man with mustache” (Card 8), and in her response of “pair 
of boots” (Card 11). 

Carmen’s injection of human m 
sponses shows, say the Rorschach experts, that she is prompted from 
within. But her use of form in nine of her responses shows that she 
keeps these inner impulses under control. When a subject attributes 
movement to a majority of the blot concepts, his inner impulses 
are, say the Rorschach experts, running riot; and when he over- 
emphasizes form responses, he is subjecting his inner impulses to 
overrigid intellectual control. 

There are 13 major classes of determinants in the Rorschach 
test. Each class can be manifested in several different ways. Some 
of these ways are listed in Table 162, following Bell. 

Form. The two chief characteristics of concern in form responses 
are their definition and their accuracy. It is necessary to be con- 


seen in her response of 
two ladies gossiping together’ 
ss gossiping sog 
two people holding hands over 


ovement into eight of her re- 
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cerned with the form qualities of the concept as imagined by the 
subject, with the form qualities of the blot itself, and with what are 
considered the conventional form qualities of the concept. Dis- 
crepancies can occur between the individual and conventional 


Tase 162. Frequently Used Determinants of Rorschach Responses* 


1. Form answers: 
a. Good forms 
b. Poor forms 
c. Percent of well-perceived forms in relation to the total number of forms 
2. Movement answers: 
a. Human movement 
b. Impending human movement 
c. Animal movement 
d. Impending animal movement 
e., Inanimate movement 
F. Detail involving movement 
3. Color responses: 
a. Color responses not involving form 
4, Color naming 
c. Color description 
d. Color symbolism 
e. Color with form used secondarily 
f. Color and form combined 
g. Sum of the color responses 
h. Disturbances in responding to colored cards 
i, Disturbances in reactions to the red color 
4. Shading responses (chiaroscuro): 
a. Reactions in which subject responds to specific parts of the shading (e.g., highlights or 
cast shadows) x: 
4. Reactions to the total impression of shading 
c. Light-determined responses 
d. Vista responses: shading used to create Perspective and differentiated surfaces 
e. Shading as surface texture i 
J. Shading involved in projecting a three-dimensional object on a two-dimensional place 
(e.g., X-Ray, topographical maps) i 
g. Shading used as diffusion 
4. Use of black, gray or white as color 
i. Disturbances in 


reactions to the shading elements in the cards 


* From Bell, J. E. Projective Techniques. New York: Longmans, Green & Co > Inc., 1948. 
= 3 


concepts or between the form 
itself. These latter discrepan 
arbitrary responses, in the mech 
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one blot to the next, in confabulation (the attribution to the whole 
of some concept suggested by a part), in inaccurate outline responses, 
and in a disregard for obvious form elements in the blots leading, by 
this neglect, to rather indefinite concepts. 

Responses of form are considered mediocre when they are merely 
of a popular variety, when they are noncommittal, when they seem 
to evade form, and when they make fairly accurate use of outline 
but of little else. Form responses above average in accuracy are 
those in which much elaboration or organization of the blot material 
occurs, and, of course, the greater the degree of elaboration or 
organization, or both, the more above average the response is con- 
sidered to be. 

Color. Responses involving color as a determinant can be classified 
into three general categories: achromatic color responses, combina- 
tions of color and form responses, and pure bright color responses. 
Achromatic color responses involve distinctions between the use or 
the rejection of achromatic color areas, combination of the achro- 
matic with bright colors, illumination effects, and the photographic 
(black and white) reproduction of bright colors. 

Combination color responses are generally divided into those in 
which color is predominant but form is a necessary adjunct and 
m is dominant but color is a necessary adjunct. 
Each of these two more general classes is subdivided into those 
responses which seem to represent a “natural” combination, into 
forced, arbitrary, and loose combinations, and into inaccurate 
combinations. 

Finally, pure, bright color responses are classified as crude, e£. 
color naming or color description, and as the use of color in some 
symbolic way. 

In general, color responses are supposed to indicat 
to establish healthy emotional relationships with the external world. 
Fòrm responses are supposed to indicate the state of balance be- 
tween the intellectual and emotional aspects of personality structure. 
Movement responses are supposed to indicate the extent to which a 
person is guided by his internal spontaneous impulses. — 

Klopfer and Kelley indicate that an optimum relation between 
form and color responses consists of twice as many responses 1n 
which form predominates over color as responses 1n which color 
predominates over form and color responses alon 


those in which for 


e a readiness 


e. The use of color 
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apart from form is supposed to indicate an impulsive emotionality. 
The more important form becomes in relation to color, the more the 
subject’s emotion is brought under control. An overemphasis on 
form in contrast with color, say when more than one-half the form 
responses have no color determinant, indicates a tendency to repress 
or control spontaneous reactions. 

Shading. Shading, as a determinant, can be used as a surface 
impression or as a depth impression. If it is used as a surface im- 
pression, it is supposed to indicate a tendency for the subject to be 
cautious in his approach to the external world. If shading is used as 
a depth impression, it is supposed to indicate an inner stirring, some 
type of anxiety. Besides these two uses of shading, Klopfer and 
Kelley indicate that it is important for us to distinguish between a 
differentiated use of shading and an undifferentiated use of shading. 
A differentiated use of shading is supposed to indicate caution and 
a sense of refined control. In contrast, an undifferentiated use of 
shading is supposed to indicate that the subject’s emotional reac- 
tions are vague and general. 

_ Movement. The use of movement is supposed to indicate a rich, 
inner associative life. This movement can be human or humanlike. 
It can be visualized as human action taking place or as the live 
posture of some living figure. Other modes of seeing human action 
are also possible, but those we have listed are those most frequently 
encountered. Animal action can be visualized, again, in the whole 
blot or in part of it. It can be seen as the live posture of a living 
animal, as action apparently occurring in only part of the animal, or 
as action a induced by natural forces. When movement is 
Syne aaraa et MBS id be oa 

Sareea. 5 n color predominates over 
movement, the subject is said to be controlled by outer, extrotensive 
tendencies. 

Originality. Finally, we must concern our: 
nality of Carmen’s responses. Did she give 
did she give responses typical of those pes 
The examiner makes no comment on 25 F 


of the other five he classes as original, and three as popular. This 
is not very many, and Klopfer and Kelley state har the useat 


less than four popular concepts indicates a lack of conformity on the 
part of the subject.” y. 


selves with the origi- 
zive unique responses? Or 
iven by other individuals? 
of Carmen’s responses. Two 
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INTERPRETATION 


To summarize our discussion thus far, we find that we can analyze 
Rorschach responses in terms of their content, time relationships, 
location, determinants, and originality. These analyses enable us to 
the subject’s characteristic mental 
extent to which he possesses intro- 
ality habits, the type and degree of 
aneous impulses, and his degree of 


gauge, say the Rorschachers, 
approach to his problems, the 
versive vs. extroversive person 
control he exercises over his spont 
emotional adjustment and maturity. 

To determine each of the foregoing, it is necessary for the inter- 
preter to concern himself with the quantitative results of the re- 
sponses, with their configuration, with the distribution of responses 
for all 10 cards, with a sequence analysis of the scoring list, with a 
qualitative analysis of all individual responses, with an analysis 
of the general symbolic character of the content of the responses, and 
with an analysis of any conspicuous behavior while the subject was 
responding to the cards. 

A basic tenet in Rorschach theory is that human beings are 
subject to two sets of forces: those arising from within and those 
impinging upon us from without. It becomes a matter of importance, 
therefore, for the Rorschach test to assess the relative strengths of 
these two sets of forces and to determine in what ways and how 
effectively a subject balances and controls these forces. 

_ The promptings from within are exhibited, say the Rorschachers, 
in the attribution of movement, particularly human movement. 


Thus a marked introvert should give a large number of action 
y that to attribute movement requires 


concepts. This on the theor 
from one’s inner 


the exercise of imagination, and imagination comes 
promptings. 

A tendency to be responsive to external stim 
say the Rorschachers, in a subject’s reactions 
1 extrovert should give a large number of color 
responses. This brings us to the point where we can see that a meas- 
ure of extroversion-introversion should be contained in the ratio of 
responses. When the number of movement 
responses is divided by the number of color responses, WE have a 
lues indicate introversion and small values 
and Kelley do not accept this formula 


ulation is exhibited, 
involving color. On 


this theory, a markec 


color and movement 


formula in which large va 
indicate extroversion. Klopfer 
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as the sole measure of introversion-extroversion, but we need not 
concern ourselves here with their reservations. 

Evidence for the type and degree of control which a subject can 
exercise over his spontaneous impulses is to be found, say Klopfer 
and Kelley, in responses based on form. There are three types of 
control we need to consider: outer, inner, and constrictive. Outer 
control is evidenced by the use of concepts utilizing both form and 
color but in which form is the dominant determinant. Inner control 
is evidenced by the use of concepts implying human movement. And 
finally, constrictive control is evidenced by the use of form alone. The 
subject who uses form alone is supposed to be showing that he is 
imposing a rigid intellectual control, amounting almost to repres- 
sion, of the external stimuli evidenced in color responses and of the 
internal promptings evidenced in movement responses. 

Adjustment’ and maturity are indicated, say the Rorschachers, by 
the ratio obtained from dividing the number of human responses 
(including human details) by the total number of responses. The 
greater the proportion of human responses, in relation to the total 
number of responses, the greater the degree of adjustment and 
maturity, according to the Rorschach expert. Coupled with this 
measure of adjustment and maturity are several other indices, One 
is a measure of inferiority. This is supposed to be reflected in the 
proportion of responses involving wholes. The greater the proportion 
of. concepts involving details, the greater the feelings of inferiority. 
chiefly, or at leant pe AEE eee sY This comes. ont 

Dof letal PERE hen a subject gives a large 
number of concepts related to his field of occupational endeavor, for 
aap when an archaeologist gives a large number of adac 

al con ; indi pa ee 
iis peateigal keene wea epee eee a nga 
security. A third index is that i ti tie E rs 
y at in Icating the presence of escapist 


tendencies. bs is supposed to be revealed if a subject gives a large 
proportion of cartoon and mythological co 
might mention the tendency to be - Os open. be 


t overcritical. This is evidenced b 
a tendency to give parts of animals whe T 


whole animals. We can conclude thi 


maturity are derived primarily fr 


of the location of a subject’s responses. i 
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RELIABILITY 


The unstructured nature of the Rorschach test would seem to 
make it extremely difficult for us to determine its reliability. And, 
: ch experts have taken exactly this stand and 
blithely ignore the problem altogether. But if the Rorschach test 
is to be a useful instrument of analysis, it must be made to survive 
exactly the same standards we impose upon any system of psy- 
at. Therefore it is imperative that we know the 
hich can be attached to Rorschach responses. 

The literature reveals only a scattering of information on the 
reliability of the Rorschach, but that available shows that Rorschach 
responses possess, in many instances, as satisfactory a degree of 
reliability as do many of the more classic paper-and-pencil tests. 
We give in Table 163 a list of the coefficients reported by Hertz, 


in fact, some Rorscha 


chological measuremet 
degree of reliability w 


Taste 163. Spearman-Brown Reliability Data for the Rorschach Test* 


Variable Reliability 

1. Total number of responses.....+++++++0 00+" 89 

2. Percentage of whole answers 84 

3. Percentage of normal detail. 15 

4. Percentage of rare detail...-...--++++e-0 0 -86 

5. Percentage of oligophrenic detail -81 

6. Percentage of white space detail... ano ABE 

7. Percentage of “good” forms. ... PEE -| 

8. Percentage of movement answers... - as ath 

9. Percentage of chiaroscuro answers. » -+-+ .92 

10. Percentage of color responses. . ... .81 
11. Percentage of animal responses... .83 
12. Percentage of human responses..--.---- .86 
OF 


13. Percentage of anatomy responses 
14. Percentage of popular responses. ....+-+++-+ i 


M. R. The reliability of the Rorschach ink-blot test. 7. app? Psychol., 1934, 


* From Hertz, 
18, 461-477. 


one of the indefatigable Rorschach workers. These data she secured 
from the correlations between the responses to the five odd-num- 
bered cards and the responses to the five even-numbered cards. Her 
subjects were 100 boys and girls in the Cleveland junior high schools. 
_ In addition to these reliability data, Hertz gives the intercorrela- 
tions among a number of the separate score variables. These are 
given in Table 164. The intercorrelations are moderate and, for the 
most part, lower than the reliability coefficients given in Table 
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Tage 164. Intercorrelations among Several of the Rorschach Variables* 


Variable 2 7 $ 4 | $ 6 
| 
= : E | š 
1. Percentage “good” form 
2, Percentage original | 
3. Number of responses... ... 39 | 
4. Percentage animal responses. | —s01 | — 44 28 
Bes SDN i ae Siac nde | 07) gol .29| =a 
Ge Whole: answerstie<.rce eve ensexciawens am | H 339 | 24| — 25 39 
7. Movement answers.......... | 17 J51 4- 29 SWS: | 2 


* From Hertz, M. R. The reliability of the Rorschach ink-blot test. 7. appl. Psychol., 1934, 
18, 461—477. g 


163. This suggests that the several scores do get at different aspects 
of personality. 


VALIDITY 


The number of studies on the validity of the Rorschach test is 
sadly out of proportion to the voluminous literature purporting 
to show the uses to which the test may be put. 


This is most unfor- 
tunate and demonstrates that the same st 


i de andards have not been 
applied as rigidly and as consistently to the Rorschach test as they 


have been applied to the more conventional ty 
report on three studies, however. Two of these y 
and the other negative results. Thus the Rc 
value in one type of situation or for or 
type of situation or for a second pur 
chach test is no different from any 
this volume. 


Munroe. The first study we shall di 


Munroe. She developed a seties of diagnostic signs for maladijust- 
=" applied these signs to the records of 348 student i t ee 
Tarena College. Then she compared the adjustment res š ob- 
tained from the application of these diagnostic signs witl d mic 
standing one year later. She found a relationshi ma 49 ( sea) 
of contingency): This isa definitely Sanke rela ti shi “The 
data presented in Table 165 illustrate Munroe’s res i =e 
Several of the Rorschach signs which Munroe ae a TAE NE ats 
of maladjustment are given in Table 166. ae F es geal 
25 diagnostic signs. In our list we have omitted pn che it 
the sake of clarity. We can conclude that the hen o as 
a í 


pe of test. We can 
ield positive results 
orschach test may have 
ne purpose but not in a second 
pose. In this respect the Rors- 
other test we have discussed in 


scuss is one reported by Ruth 
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scored by Munroe for maladjustment, yields an index of some value 
in predicting future academic standing. 


Tanie 165. Rorschach Adjustment Ratings in Relation to Academic Standing* 


Severe | Moderate Slight Adequately roeal 
| problem problem proble: adjusted on 
Academic standing } i : : = 2 i meas i sa | 
— | — 
s , " | | 
N| 4 | NL %|N| ar AF 
———— 3 d = — | 
ees —| 5 bess 
Superior. sa a @| 76) 6| Z 4) 23] 4] Ba] 40| m3 
Satisfactory. | 24] 30.3 |34| 49.4 | 76 70.4 | 52 | 67.6 | 186 | 53.4 
Lawaverngesssnes] 27 | TEA 38.1) B) 167 9) MA 86) es 
Folling: osise = 22 | 27.9|12 ma | 0] Co] 2 2.6 | 36 | 10.3 
SEAL azn oe 2 --| 79 | 100.0 84 | 100.0 | 108 | 100.0 | 77 | 100.0 | 348 | 100.0 


R. L. Prediction of the adjustment and academic performance of college 
a of the Rorschach method. Appl. Psychol. Monogr., 1945, No. 7. 


* From Munroe, 
students by a modificatior 
Pasie 166. Rorschach Signs of Maladjustment According to Munroe’s Inspection 
Technique” 


‘ailure to make a response to one or more cards 


he use of the whole blot in less than 15 per cent of a subject’s responses 


- A vague or bad whole response 
. The use of small or rare details 
. Excessive use of white space 

. A loose succe 
. None or ve 
8. An excessive number, 
. A limited range of content 

. Very low per cent of responses using form 
. Vague, bad or overexact form responses 

. Poor responses to shading 

. Absence of human movement 
Absence of form color respons 


. Excessive color form respon 
16. An extremely high or an extremely low color movement ratio 


NAW wN 


y few popular responses 
or a poor quality of sex and ana tomical responses 


ponses 


S 


he adjustment and academic performance of college 


* From Munroe, R. L. Prediction of t 
»sychol. Monogr., 1945, No. A 


Students by a modification of the Rorschach method. Appl. F 
onformity was devised by Eysenck by a joint 
ch ink blots and of some of the principles 
in the Terman-Miles M-F test. 


Eysenck. A test of € 
adaptation of the Rorscha 


of word association exemplified 
Upon the basis of data compiled by Harrower-Erickson, Eysenck 


selected four neurotic and five normal response words for each of the 
Rorschach ink blots. These are given in Table 167. Then he pro- 
ceeded to show an ink blot to a subject and asked him to rank the 
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responses in order by writing a 1 after the response which seemed 
most like the ink blot and so on down to 9 after the response that 
seemed least like the ink blot. This procedure was repeated for each 
of the ink blots. 

In scoring, only the four neurotic answers were considered. 
Eysenck theorized that the completely consistent neurotic patient 
would rank the neurotic answers Ist, 2d, 3d,and 4th but that the com- 
pletely consistent normal person would rank them 6th, 7th, 8th, and 
9th. Summing these ranks, we find that the neurotic person would 


Taste 167. Items in the Ranking Rorschach Test* 


Neurotic Answers 
Card 1: 
Mud and dirt 
An X-ray picture 
A dirty mess 
Part of my body 


Card 2: 
An insect somebody stepped on 
A blood-stained spinal column 
A bursting bomb 
Black and red 


Card 3: 
Meat in a butcher’s shop 
Part of my body 
Red and black 
Spots of blood or paint 


Card 4: 
Lungs and chest 
A nasty mess 
Black smoke and dirt 
An X-ray picture 


Card 5: 
A smashed body 
An X-ray picture 
Lungs and chest 
Black clouds 


Card 6: 
An X-ray picture 
Sex organs 
Mud and water 
A gray smudge 


Normal Answers 


An army or navy emblem 
A bat 

Two people 

A pelvis 

Pinchers of a crab 


Two scottie dogs 

Little faces on the sides 
A white top 

Two elephants 

Two clowns 


Two birds 

Two men 

A colored butterfly 

Monkeys hanging by their tails 
A red bow-tie 


Head of an animal 
A pair of boots 

A man in a fur coat 
An animal skull 

A big gorilla 


An alligator’s head 
A fan dancer 

Legs 

A bat or butterfly 
A pair of pliers 


Two king’s heads with crowns 
Pagan idol on a pole 

A fur rug 

A polished post 

A turtle 
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Tague 167. Items in the Ranking Rorschach Test* (Continued) 


Neurotic Answers Normal Answers 


Card 7: 
Smoke or clouds 
Animals or animal heads 


Two women talking 
Part of my body 


Dirty ice and snow A map 
An X-ray picture Lambs’ tails or feathers 
Bookends 


Card 8: 
An X-ray picture 
Pink, blue and orange 
Fire and ice, life and death 
Parts of my body 


Flower or leaves 

A horseshoe crab 

A colored coat of arms 
Two animals 

Blue flags 


Card 9: 
Red, green and orange 
Parts of my body 
Smoke, flames or an explosion 


Clouds with blood 


Sea horses, or lobsters 

Flowers or underwater vegetation 
Deer or horns of a deer . 

Two people—witches or Santa Clauses 


A candle 


Card 10: 
Spilt paint 
An X-ray picture 
Red, blue and green 
Parts of my inside 


Two people 

A Chinese print 

Spiders, caterpillars, crabs and insects 

A colored chart or map 

A flower garden or gay tropical fish 

OM se Eysenck, H. J. Dimensions of Personality. London: Routledge and Kegan Paul, Ltd., 


“get a score of 10 and that the normal person would get a score of 


30. Since there are 10 cards in the Rorschach series, the most neurotic 
score would be 100 and the most normal score would be 300. Scored 


Taste 168. Comparison of High and Low Score Groups on the Ranking Rorschach 


Test* 

iem Good Rorschach | Poor Rorschach 

group, per cent | group, per cent 
N.C.O, status. eorensenea es Ki t 34 18 
Abnormality in parents. . 20 44 
Abnormal sex activity... 16 32 
Unstable. esisi 30 46 
Weak, dependent.....-+ +++ 1555+ 30 58 
Aggressive... o oei tete te tte eee 12 24 
Anxious,.... 26 54 
12 56 


* From Eysenck, H. J. Dimensions of Personality. London: Routledge and Kegan Paul, Ltd., 


1947, 
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in this way, the test is found to have a split-half reliability, corrected 
by the Spearman-Brown Prophecy Formula, of .84. 

Eysenck selected 50 cases with the highest scores and 50 cases 
with the lowest scores and compared them on a number of per- 
sonality items. The results are given in Table 168. They show that 
the test scores are related to a number of items commonly thought 
to be indicative of a neurotic type of personality. Eysenck also 
compared a hospitalized (neurotic) group with a nonhospitalized 
(normal) group and found a significant difference between their 
respective mean scores. i 

Kurtz. The third study we wish to report is one on the use of the 
Rorschach test for the selection of sales managers in the life insur- 
ance business. Kurtz, who supervised the study, shows clearly how 
an investigator.can go astray if he does not adhere rigidly to sound 
principles of statistics and experimental psychology. In this study 
the Rorschach test was administered by Dr. Helen Margulies Mehr, 
a psychologist and a Rorschach expert. She administered the test 
to 80 sales managers in eight life insurance companies. Of these 
managers, 42 were considered successful and 38 were considered 
unsuccessful. Established methods of scoring the Rorschach test 
failed to yield any differentiation whatever between these two 
groups of managers. Therefore, Dr. Mehr made 
the responses made by these two groups of man 
a scoring key based upon 16 responses on which 
managers differed from each other. The scoring 
| Ae for any response more frequently char Lc 
ul managers and a weight of —1 for ; 
exer of the each a nig onan THOS frequently 
occur on a particular record, a pit WO. fas hie ona dig no 
Systeri gask a hemratieal e a ek en given. This scoring 
applied to the records of the 80 mana ha tee T ea aW i 
classification of all but one of ie cy x made possible isl 
To many investigators this would seem t ‘ope rl ae cen 

ti Fae cul f © be a most striking demon- 
stration of the value o the test as a selective device for sales 

: ce for sa 
managers. But let us inquire further into the m id Kurtz 
and see why we cannot accept this res EET ASINE 


: ult as given. Kurtz gives an 
example which should make the matter clear. He $ urtz giv 
ar. ays: 


anagers a 
Us gers and one poor man 
score” these people on ancestry 


a special study of 
agers and developed 
these two groups © 

key assigned a weight 
acteristic of the success- 


Suppose that eight good m r 
Irish ancestry. If we “ ager are of Irish or pat 


eight good managers wi 
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receive a score of 1 and only one poor manager will receive such a score. This holds 
regardless of whether there is a real relationship or whether, due to chance, there 
A 


happen this time to be a few more Irish men in one group than the other 


he real test, continues Kurtz, “iş not whether the scoring system 
| group but whether it will work on other 


will work on the origina 
ance the scoring was extended to 41 


groups.” In the present inst 
The new cases were tested by Rorschach experts, 
who were not informed as to which of the managers were successful 
and which were unsuccessful, and they sent their analyses to Dr. 


Kurtz. The results are given in Table 169. 


additional cases. 


and Unsuccessful Managers on the Rorschach 


Tape 169, 4 Comparison of Successful 
Test* 


Rorschach | 20 poor 21 good 
scores | managers | managers 
| | 
— | - 

45 | 3 2 
2 3 4 7 
0, I 4 3 
—1,=2 8 8 
—3,—-+ 1 | 1 


* From Kurtz, A. K. A research test of the Rorschach test. Person. Psychol., 1948, 1, 41-51. 


ligible, being :02. 


17, the relationship is neg 
be predicted 


as to whether success can 
a 13-item experience record gives a 


Table 170 supplies 


Expressed as a Pearsot 
Lest there be some question 
at all, Kurtz reported that 
correlation of .48 for this same group of cases. 


additional details. 
Subsequent to this further tryout, Dr. Mehr reanalyzed the 


r i ae z 
ecords, and, using not only the original 32 signs but also any others 


ssful and Unsuccessful Managers on an Experience 


Tanur 170, 4 Comparison of Succe 
Record* 


an 20 good 20 poor 
Experience scores | 
| managers | managers 


60 and up x aval 1 0 
E a sn a 0 a 
a er 3 6 
45-49 | 8 t 
40—44.. | 8 2 


* From Kurtz, A. K. A research test of the Rorschach test. Person. Psychol., 1948, 1, 41-51. 
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she wished, she made a further classification with the results pre- 
sented in Table 171. The correlation between these ratings and 


Tase 171. Æ Second Comparison of Successful and Unsuccessful Managers on the 
Rorschach Test* 


20 good | 20 poor 
managers | managers 


Rating by Dr. Mehr 


wu 


Less confident of failure 
Confident of failure... . 


o = e 


* From Kurtz, A. K. A research test of the Rorschach test. Person. Psychol., 1948, 1, 41-51. 


success is higher (.17), but it is still not significant and still does not 
compete with the correlation of .48 yielded by the experience record. 
Kurtz further points out that age alone yields a higher correlation 
(.31) with success than do the Rorschach test scores. 

Kurtz, in this study, shows the utter folly of judging the validity 
of a scoring key upon the basis of its application to the groups on 


which it was developed. He also shows that even with tremendous 
effort the Rorschach test could not 


yield as valid predictions as two 
methods. 

_The point of reviewing this study in detail is to demonstrate the 
kind of study needed before we can assert that any ps chological 
test yields valid results. This kind of study is issue wont non- 
existent in the voluminous Rorschach literature. We cannot accepts 
therefore, many of the claims of its partisan adherents, Admittedly, 
it 1s not an easy matter to secure the type of criterion wou s needed 
for such validation studies, but difficult though thi = E it does 
not make it permissible for us to cl : ee 


> in this instance, be made to 
alternate, and more objective, 


aL aim validity į o 
substantiating data. ty in the absence 
The discussion in this sectio 5 
n should in no i , 
ing that small contrastin way be taken as imply 


13 


PROJECTIVE TECHNIQUES: AN IMAGINAL 


APPROACH 


We saw in the last chapter how Rorschach, and his ardent band of 
disciples, attempt to derive insights into personality structure by 
analyzing our responses to a series of meaningless ink blots. We are 
now to see how Morgan and Murray, and their followers, attempt 
to gain similar insights by analyzing our responses to a series of 


ambiguous pictures. 
h the medium of the Thematic Apperception 


This is done throug 
Test, a test of a subject’s power of imagination. This test consists 
of a series of 19 pictures and one blank card. These are shown one 


by one to a subject who is asked to make up a story based on what 
he sees in each card. These stories become the raw material for our 


analysis. 
CONTENT 


Each picture in the Thematic Apperception Test is capable of 
eliciting a wide variety of interpretations. This is one of the two 
basic facts which give the test its value. The second “fact” is our 
tendency to interpret ambiguous situations in directions conforming 
to our own present wants and to past behavior. One of the pictures 
in the Thematic Apperception Test is that of a little boy with. a 
violin. In looking at the picture, we can all agree that this is a 
picture of a small boy with a violin. But we shall differ in our inter- 


pretation of what this picture represents. One person will see a boy 
longing to become a great musician. A second person will see him 
musing over some piece he has just played. A third person will feel 


that the boy is frustrated in not being able to play correctly a 
particularly difficult measure. And a fourth person will see a boy 
359 
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who resents having to practice his lesson because he would rather 
be outside playing baseball. In these different stories we are to look 
for our clues to personality. 
The pictures in the Thematic Apperception Test are listed in 
Table 172. Eleven of the Pictures are suitable for both men and 


Taste 172. Pictures in the Thematic Apperception Test* 


First series: 


1. A young boy is contemplating a violin which rests on a table in front of him. 

2. Country scene: in the foreground is a young woman with books in her hand; in the back- 
ground a man is working in the ficlds and an older women is looking on. 

3. (BM)# On the floor against a couch is the huddled form of a boy with his head bowed on 
his right arm. Beside him on the floor is a revolver, 

3. (GF) A young woman is standing with downcast hez 


ad, her face covered with her right 
hand. Her left arm is stretched forward against a wooden door, 
4. A middle-aged wi 


oman is standing on the threshold of ah 
a room. 


. (BM) A short elderly woman st. 
latter is looking downward with 
(GF) A young woman sitting on 
older man with a pipe in his mou 
6. (BM) A gray-haired man is loo 
space. 


alf-opened door looking into 


v 


ands with her back turned to 
a perplexed expression. 

the edge of a sofa looks back over her shoulder at an 
th who seems to be addressing her. 

king at a younger man who is sullenly staring into 


a tall young man. The 


u 


D 


. (GF) An older women is sitting on a sofa close beside a girl, speaking or reading to her. 
The girl, who holds a doll in her 
+ (BM) An adolescent boy looks straig 


lap, is looking away. 
ht out of the picture, 
ble at one side, and in the backgrou. 
reverie-image. 


x 


The barrel of a rifle is visi- 
nd is the dim scene of a surgical operation, like a 
7. (GF) A young women sits with her chin in her ha 
8. (BM) Four men in overalls 
8. (GF) A young woman with 


nd looking off into space. 

taking it casy, 

in her hand looks from behind a tree 
i rty dress running along a beach, 

anst a man’s shoulder, 


are lying on the grass 
a magazine and a purse 
at another young woman in a pa 

9. A young woman’s head ag: 
Second series: 


10. A road skirting a deep chasm between high cliffs. 


obscure figures. Protruding from the rocky w; 
a dragon. 


On the road in the distance are 
all on one side is the long head and neck of 


11. (M) A young man is lying on a couch wi 
gaunt form of an elderly man, his hand 
figure. 

11. (F) The portrait of a young women. A weird old women with a shawl over her head is 

grimacing in the background. an 

12. (BG) A rowboat is drawn up on the b. 
figures in the picture. 

12. (MF) A young man is standing with downcast head buried in his arm. Behind him is 
the figure of a woman lying in bed. iji 

13. (B) A little boy is sittin 

13. (G) A little girl is climb 


ith his eyes closed. | 


aning over him is the 
stretched out 


above the face of the reclining 


ank of a woodland Stream. There are no human 


g on the doorstep of a log cabin, 
ing a winding flight of Stairs, 
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Tase 172. Pictures in the Thematic Apperception Test* (Continued) 


14. The silhouette of a man (or woman) against a bright window. The rest of the picture is 
totally black. 

15. A gaunt man with clenched hands is standing among gravestones. 

16. A blank card. 

17. (BM) A naked man is clinging to a rope. He is in the act of climbing up or down. 

17. (GF) A bridge over water. A female figure leans over the railing. In the background are 


tall buildings and small figures of men. 
18. (BM) A man is clutched from behind by three hands. The figures of his antagonists are 
invisible. 
18. (GF) A woman has her hands squeezed around the throat of another woman whom she 
vards across the banister of a stairway. 
ons overhanging a snowcovered cabin in the country. 
nan (or woman) in the dead of night leaning against a 


appears to be pushing back 
19. A weird picture of cloud formati 
20. The dimly illumined figure of a n 


lamp post. 


* From Murray, H. A. Thematic Apperception Test Manual. Cambridge, Mass.: Harvard 


University Press, 1943. 
t Initials indicate whether picture is suitable for boys (B), male adults (M), girls (G), or 
female adults (F). If no initials, there are no restrictions. 
women. Ten are used with men alone, and ten are used with women 
test are those which Murray and Morgan 
ibuting “to the total personality picture.” 
of a number of 
d. After this 


ures on their 


alone. The pictures in the 
found most useful in contr 
They were selected after the “entire” personalities 
subjects had been intensely and thoroughly studie 


study, Murray and his coworkers rated a series of pict 
to the over-all personality picture. 


effectiveness in contributing 
as the elements 1n 


>; 5 : ; ; 
Pictures with the highest ratings were retained 
the Thematic Apperception Test. 


DIRECTIONS 


. When the pictures are presented to a subject, they are divided 
Into two series of ten pictures each. The subject reacts to each series 
1n two separate one-hour sessions, separated by one or more days. 
The first series of pictures are more commonplace than the second 


and are presented to the subject with the following directions: 

one form of intelligence. I am going to show you 
your task will be to make up as dramatic a story 
led up to the event shown in the picture, describe 
what the characters are feeling and thinking; 


your thoughts as they come to your mind. Do 
you can devote about 


This is a test of imagination, 
some pictures, one at a time; and 
as you can for each. Tell what has 
what is happening at the moment, 
and then give the outcome. Speak 
you understand ? Since you have fifty minutes for ten pictures, 
five minutes to each story. Here is the first picture. 
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These are the instructions for the second day: 


The procedure today is the same as before, only this time you can give freer 
rein to your imagination. Your first ten stories were excellent, but you confined 
yourself pretty much to the facts of everyday life. Now I would like to see what you 
can do when you disregard the commonplace realities and let your imagination 
have its way, as in a myth, fairy story, or allegory. Here is Picture No. 1. 

When the subject come to Card 16 
what you can see on this blank card. 
it to me in detail.” Following this des. 
a story about it.” 


: ug 

» the blank card, the examiner says, “See 
Imagine some picture there and describe 
cription, the examiner says, “Now tell me 


During each of the two sessions the individual is seated in a chair 
or is stretched out on a couch, preferably with his back to the 
examiner. During most of the two sessions the examiner makes no 
comment. However, after the first picture in the first series, it is 
desirable, by way of encouraging the subject, to compliment him 
upon his story and to remind hin, if necessary, of the directions. 
During the remainder of the two sessions he indicates to the subject 
whether he is ahead of or behind schedule, he encourages him with 


frequent praise, he calls the subject’s attention to any important 


omission, such as an omitted outcome of 


5 : i a story, he reminds the 
subject to stick to the major plot of a story (if he is inclined to give 


The examiner records 
possible, he has a stenograp 
ject’s comments are phonographically recorded. At tl a 
of the first session he arrang j 
second appointment, but he 


magazines, newspapers, radio, and sı 
In order to interpret the subject’ 


( S stories, it is necessary that the 
examiner have a good background j 


n clinical experience, in observing 
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individuals, in interviewing, and in testing. He should also have some 
knowledge of psychoanalysis, and he must have months of training 
in the specific technique of interpreting the stories secured in re- 
sponse to the pictures in the Thematic Apperception Test. In this 
text we can make only superficial examination of the procedures 
used, and no one should expect to be able from our comments to 
become an expert user of the Thematic Apperception Test. All we 
can do is to indicate the types of analysis which are made and 
thereby give the reader something of the flavor, but not of the 
substance, of the Thematic Apperception Testing technique. 

To make his analysis the examiner must know the sex and age of 
his subject, whether his parents are living, dead, or separated, the 
age and sex of his siblings, the subject’s vocation, and his marital 
status. With these facts in hand the examiner begins his analysis 
of the subject’s stories. He breaks each story down into a series of 
successive events and then looks in each of these events for the 
force or forces emanating from the “heroes” of the stories and for 
the force or forces emanating from these heroes’ environment. To 
do this the examiner must find in each story the hero or other charac- 
ter with whom the subject identifies himself. He must analyze the 
motives, trends, and feelings (that is, the needs) of this hero or 
principal character. He must locate the environmental forces that 
act or press upon this hero. And finally, he must compare these 
needs and presses to determine the outcome. Thus the examiner 
must concern himself with four things in each story: the hero, his 
needs, his environmental press, and the outcome. 

The Hero. Finding the hero or principal character with which 
the subject identifies himself is not always an easy matter, but the 
following possibilities should be carefully considered: 


1. That the subject identifies himself with the character in whom he shows the 


greatest interest or whose point of view he adopts i 
2. That the subject identifies himself with the character whose feelings and 


motives are most intimately portrayed 
3. That the subject identifies himself with the character who most resembles 


himself in sex, age, status, or role 4 
4. That the subject identifies himself with the character most concerned in the 


Outcome of the story. 


Each of these types of identification is fairly straightforward. But 
since they are not exhaustive of all possibilities, those of a more 
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complex nature must also be considered. Some of these more complex 
possibilities are indicated below. 


1. The subject may shift his identification as the story unfolds. 

2. The subject may continually identify himself with two characters, one repre- 
senting one element in his personality and the other representing a second element. 

3. The subject may tell a story that tells a story. In this case there may be a 


primary and a secondary hero. The subject may identify himself with one or with 
both of these heroes or with neither of them. 


4. The subject may identify himself with a character of the opposite sex. A man 
with a high feminine component may identify himself with a woman, and a woman 
with a high masculine component may identify herself with a man. 

5. There may be no identification at all. There may be no hero or chief char- 


acter in the story, or if there is, he may be perceived as a part of the subject's 
environment. 


Needs. To determine motives, trends, and feelings of heroes, the 
examiner must observe everything the hero or principal character 
feels, thinks, and does. In particular, he must note everything 
unusual, uncommon, or unique. Murray states that the Thematic 
Apperception Test requires no one particular theory of personality, 
but he makes use of his own conceptual scheme consisting of a 
catalogue of 28 needs. Included in Murray’s list are abasement, 
achievement, aggression, dominance, nurturance, passivity, sex, and 
succorance. 

In the stories elicited by the pictures, these needs are to be dis- 
covered as impulses, wishes, or intentions or are to be found in the 
overt behavior of the heroes of the stories. When we discover these 
needs, we need to know their strength. Murray suggests we do this 
by means of a five-step scale of need intensity. We can, however, 
use as many or as few steps as we wish. In any event, to determine 
the intensity of a need we must consider its duration, the frequency 
of its occurrence, and its importance in the plot of the story. When 
all needs have been discovered and rated, those that appear un- 


usually strong and unusually weak are listed for further consideration 
and analysis. 


Press. When we have determine 


the analysis of the press which Operate upon the heroes of our sub- 


Ject's stories. To make this analysis we must observe the details 
as well as the general nature of all situations in which the heroes 
find themselves. And we must note, in particular, the uniqueness, 


d needs, we turn our attention to 
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frequency, and intensity or the complete absence of any environ- 
mental press. Among the press which Murray says we shall most 
frequently have to consider are affiliation, aggression, dominance, 
nurturance, and rejection, When the press operating upon our 
subject’s heroes have been determined, we must, as we did in the 
case of needs, estimate their relative strengths. 

Outcomes. We must now compare needs and press and determine 
how the forces emanating from each hero compare in strength with 
the forces emanating from his environment. The questions to be 
considered in this connection, as given by Murray, are as follows: 


1. How much force does the hero manifest? 
2. What is the strength of the facilitating or 
ing or harmful forces? 

ement difficult or easy? 

e with renewed vigor or does he collapse? 


beneficial forces of the environ- 


ment as compared to the oppos 
3. Is the hero’s path of achiev 

4. In the face of opposition does he striv 
do things happen to him? 


5. Does the hero make things happen or € 
6. To what extent does he manipulate or overcome the opposing forces and to 


what extent is he manipulated or overcome by them? 
7. Is he coercing or coerced? 
8. Is he mostly active or passive? 
9. Under what conditions does he 


Strives alone? 
10. Under what conditions does he fail? 


and have found the answers to the 
preceding questions, we have to make a list of all major simple and 
complex outcomes. A simple outcome is any combination of, or 
interaction between, a need and a press. A complex outcome is any 
combination of two or more simple outcomes. 


succeed? When others help him? Or when he 


P , 
When we have considered 


INTERPRETATION 


a list of high and low needs, a list of high 
of outcomes. We also have with us two 
attributes of the hero represent tend- 
in the subject’s personality, and the 
in the subject’s environment. 
as proved facts. We must 
—to be verified or 


We now have before us 
and low press, and a list 
assumptions. One is that the 
encies, traits, or sentiments 
second is that the press represent forces 
But we must not take these assumptions 


consider them as leads or as working hypotheses- 


disproved by other methods of analysis. 
lusions from Thematic A 


In arriving at our conc pperception Test 
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material, Murray suggests that we give careful consideration to the 
following factors: 


The Manner in Which the Test Is Administered. If the test is not properly given, 
if the subject does not become involved in the stories, or if the stories are short and 
sketchy, the test is not apt to yield any material of significance. On the average, 
Murray says, about one-third of the stories will be barren of me: 
anything which will lower the number of si 
ment to the analysis. 


The Subject’s Judgment. A most important part of the Thematic Apperception 
Test analysis is that based upon the inquiry into the sources of the subject’s stories. 
Murray suggests that it may not always prove the wisest course for the interpreter 
to rely solely upon the subject’s judgment. Some of the stories which he will at- 
tribute to newspapers, books, radio, and so forth may 
experiences which he may not care to admit. W 
radio serial, it is important, of course, 
important to know when radio as 
source. 


Partial Data. As penetrating as any given Thematic Apperception Test analysis 
may seem, it gives only a partial picture of personality. One cannot conclude from a 
Thematic Apperception Test analysis that he really knows a subject in the sense 


that he would be presumed to know him from an extensive series of depth inter- 
views or from a complete psychoanalysis, 


Level of Function. A person can exhibit 
have ideas, plans, and fantasies. Th 
content of the stories gives the secon 

Layers in Normal Socialized Persona 
middle, and outer layers of personality. 
repressed unconscious tendencies” 
thought or action. The middle lay 
which appear in thought in undis 
objectified in action Privately and 
composed of tendencies which ar 


aning anyway. So 
gnificant stories will prove a serious detri- 


in reality, represent personal 
hen the source of a story really is a 
to discover this fact. But it is equally 
a source is alleged to cover up a more personal 


physical or verbal behavior, or he can 
e conduct of the subject gives the first. The 
lities. Murray distinguishes between inner, 
- The inner layer, he says, “is composed of 
which are never or rarely expressed either in 
er, Murray Says “is composed of tendencies 
guised form, and. , May us Also 2 ja < be 
secretly.” “Finally,” he says, “the outer layer is 


; : h e publicly asserted or acknowledged and [are] 
openly manifested in behavior.” The task of the examiner is to determine to which 


of these levels each variable noted in the Thematic Apperception Test analysis 
belongs. It is ordinarily assumed, however, that the great majority of the variables 
in a Thematic Apperception Test analysis will be found characteristic of the 
second of the three levels. Overt behavior is not hard to observe and there are many 


ways of observing it. The first level is the most difficult to observe, and the only way 
to observe it at present is through a seri 


; me es of depth interviews or through a thorough 
psyc: hoana ysis. 


‘ m against the easy and obvious assump- 

tion that the variables unusually high in Thematic Apperception Test stories will be 
5 n personality and, conversely, that vari- 

ables unusually weak in Thematic Apperception Test stories will be extremely 
his may be true and, if true, should not 
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be overlooked. But many times it will not be true, and the examiner will be led 
seriously astray if he does not probe beyond superficial similarity. It will not be 
at all infrequent for an examiner to find that the Thematic Apperception Test 
analysis shows exactly the opposite of what a subject does or says he does or is 
going to do. In summary, we might say that if the test did not frequently yield 
findings at variance with overt and verbal behavior, it would add nothing of sig- 
nificance to our observation of this overt and verbal behavior. 

Insight into the Third Level of Personality. Murray and Morgan did not devise 
the Thematic Apperception Test to get at the outer layer of personality. But they 
feel that at times it will prove useful in this connection. Murray points out that the 
stories in the first session, being in response to the more commonplace pictures, can 
usually be considered more closely related to the outer personality layer than those 
in the second session. Also, variables or tendencies not restricted by cultural taboos 
or by sanctions are apt to be related to the outer level of personality. 

Sex of Examiner. In Thematic Apperception Test analysis, the sex of the exam- 
iner is important. Therefore, it is necessary for the examiner to make allowance for 
this fact. But in just what manner this allowance is to be effected Murray does not 


Say. 
Present-life Situation. The content of Thematic Apperception Test stories will 
vary with current events, with the status of the subject, with present or momentary 
emotional states, and so forth. The examiner must be able to isolate from the mani- 
fest content of the stories the variables underlying this content. These may appear 
in one garb for one subject and in another garb for another subject. It is important 
that the Thematic Apperception Test examiner is not thrown off guard by super- 
ficial changes in content—changes which have no effect upon the fundamental 


mechanisms or dynamics involved. 


OBJECTIVITY, RELIABILITY, AND VALIDITY 


_It is clearly evident that the Thematic Apperception Test tech- 
nique is not one for the amateur. The technique is not objective. It 
requires months of training on the part of an examiner. And this 
training must consist, when it is complete, of a wide background 
of clinical and, preferably, of psychoanalytic experience. This being 
the case, there are relatively few psychologists who can be considered 
fully qualified to administer and to use the Thematic Apperception 
Test in the manner in which Murray and Morgan feel the test should 
be used, . “ye 

Reliability. There have not been many reports on the reliability 
of the Thematic Apperception Test. Murray, in his Manual of 
Directions, gives no reliability data at all. Tomkins, in his book on 
the Thematic Apperception Test, summarizes the few data on 
reliability that are available. These we may briefly list as follows: 
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1. Sanford had four judges rate ten subjects on their needs and press. The average 
intercorrelation between the judges’ ratings were .57 for needs and .54 for press. 

2. Harrison and Rotter, independently of each other, rated the protocols of 70 
subjects for emotional maturity and stability. These protocols were based on five 
pictures. Their ratings intercorrelated .73 or .77 de 
a three- or a five-point rating scale. 

3. Combs had four judges rate ten protocols. He p: 
and found that his ratings agreed with the average rating assigned by the other 
judges to the extent of 60 per cent. He also found that his own reratings, after an 
interval of six months, agreed with his original ratings to the 

4. Tomkins tested and retested 45 women subjects 
to ten months. Fifteen subjects were retested 
15 more after an interval of six months, and the remaining 15 subjects after an 
interval of ten months. The stories were rated in terms of Murray’s need-press 
schema and gave the following test-retest correlations: 


pending upon whether they used 
articipated as one of the judges 
extent of 69 per cent. 


at various intervals from two 
after an interval of two months, 


After 2 months. ....., 80 
After 6 months......,... 60 
After 10 months... vin 0 


5. Tomkins collected 400 stories told by one subject over 

He divided these stories into two groups of 200 stories each 
. . fa . e 

different raters, and reported a correlation of .91 between th 


a period of ten months. 
, had each set rated by 
he two series of ratings. 

We see that there have been two appro 
problem: the comparison of ratings assigned 
the comparison of test-retest results. The data we have presented 
show that the reliability of the Thematic Apperception Test varies, 
as does that of every other test, with its length, with the particular 
variable being assessed, with the type of rating used, and with the 
raters, 


aches to the reliability 
by different raters and 


hether some of the test-retest 
atic Apperception Test itself 
t or whether the need-press 
im. In this connection it may 
rpose of the Thematic Apper- 
ying structure of personality. 
d represent a stable, 
atory principle in human be- 


little hard to accept the possibility 
that such a fundamental and signi 


have a reliability as transitory as that indica 


lower test-retest correlations. Murray would undoubtedly disagree 


‘that the [Thematic 
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Apperception Test] responses reflect the fleeting mood as well as 
the present life situation of the subject. . . . ” And for this rea- 
son, “we should not expect the repeat reliability of the test to be 
high. ” If we accept this statement, our chief source of re- 
liability data consists of the intercorrelations among the ratings 
independently assigned by different raters. And so far, these too 


leave something to be desired. 


VALIDITY 


Experiments on the validity of the Thematic Apperception Test 
have, for the most part, been abortive. Attempts have been made to 
relate the results to those secured by the Rorschach test, but this 
necessitates the assumption that the Rorschach test itself isa valid 
measuring instrument. Other attempts have consisted of the blind 
matchings of results secured from the Thematic Apperception Test 
with personal-history data, etc. This approach Murray dismisses 
as nothing more than a parlor trick. Also he claims that much of the 
Personal-history data is needed as a basis for a complete Thematic 
Apperception Test analysis. And to deny this information to the 
examiner is to destroy much of the value in the approach. This 
leaves us with only one avenue of approach to the validity problem, 
that of comparing the results of a Thematic Apperception Test 
analysis to those secured from a thoroughgoing psychoanalysis. 
Murray’s own contention is that the two would be found to agree, 
but few if any data have ever been presented on the point. 

We can conclude that the Thematic Apperception Test, like the 
Rorschach test, represents a unique approach to the measurement 
of personality. Unlike the Rorschach test, however, it was based 
Upon a more insightful approach as to what was to be accomplished 
and goes much beyond the crude analogy stage so characteristic of 
the Rorschach test. The Thematic Apperception Test leaves much 
to be desired, however, in the way of objectivity, economy, and 
general usefulness to anyone not extremely well versed in the tech- 


niques of psychoanalysis. 


14 


PERFORMANCE: OBSERVATIONAL 


APPROACHES 


There is no chapter in this text that does not deal in some way with 
human. performance. But in all the preceding chapters we have 
started with the performance induced by a specified type of test 
or by a specific set of directions. And the behavior we have produced 
by these specific tests, or by these directions, has been of a verbal 
character. Our subjects have responded with check marks, crosses, 
or circles, or have told us stories, or have described what they see in 
ink blots, or have said (on paper) what they would or would not do 
in certain situations. Or if we have not secured this information from 
our subjects directly, someone else has supplied the information 


for us. From the verbal responses induced by our verbal instructions 
we have tried in variou 


the prediction and con 


he laughs or crys, he play. 
he stamps his feet, he thr 


demonstration, and so forth. We watch 


We are going to divide our performance-measuring techniques 
into two categories: observational and experimental. Under the 
first category we shall discuss the techniques which do not require 
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a rigid delimitation of the subject’s responses. We may carefully 
control the conditions under which these responses can take place 
but we do not delimit or circumscribe the responses sheasiadoes, 
Under the second category we shall describe the techniques which 
require that we carefully delimit the situation avd the responses 
also. Both observational and experimental approaches require us 
to collect and record facts about perceived behavior. And in both 
approaches we must follow some preconceived, prearranged, or 


formal plan. 

l The techniques we p 
into those suitable for the observ: 
suitable for the observation of groups o 
our discussion with the former. 


lan to discuss in this chapter can be divided 
ation of single individuals and those 
f individuals. We shall begin 


OBSERVATION OF SINGLE INDIVIDUALS 


Barker, Dembo, and Lewin’s study of “Frustration and Regres- 
sion” provides an excellent demonstration of the application of a 
technique for the observation of single individuals. And, at the 
same time, it yields abundant proof of the value of starting our 
observations with a clear-cut theoretical formulation of the ends 


or goals to be attained. 

_ Barker, Dembo, and Lewin wishe 
tion leads a person to behave in an immature manner, 
that frustration leads to regression in behavior. Familiar examples 
of this phenomenon are the crying of the teen-age boy who does not 
get his expected Christmas bicycle, the temper tantrum of the 
adolescent when his friend will not do his bidding, and the day- 
dreaming of the young boy who finds no satisfaction in his violin 
practice. 

Creating Frustration. The first problem which Barker, Dembo, 
and Lewin tackled was that of devising a method to create frustra- 
tion. The method they evolved consisted of the following steps: 

Free Play Period. The observer led a child into a playroom. He 
demonstrated a set of moderately desirable toys and placed these 
toys at the child’s disposal. The child was allowed to play with these 


toys for thirty minutes. 
Prefrustration Period. 
led into the playroom 


d to test the theory that frustra- 
in other words, 


casion the child was 


Upon a subsequent oc 
prior removal of an 


(enlarged now by the 
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opaque partition) where he found a much more elaborate setup than 
during his first play period. All the toys from the first period were 
incorporated in the more elaborate setup. The child was allowed to 
play in this situation from five to fifteen minutes. ; 

Transition Period. The observer collected all play materials that 
had been available in the free play situation, distributed them on the 
cardboard squares where they had first been found, pulled down a 
transparent partition blocking entrance to the more attractive 
toys, and said, “Now let’s play at this end [of the room].”” 

Frustration Period. This was the same as 
except for the fact that the child now had knowledge of the more 
attractive toys. He could see them, but he could not gain access to 
them. This period lasted for thirty minutes. 

Postfrustration Period. When the observer had concluded his 
observations for the frustration period, he raised the partition and 
allowed the child to play with the more elaborate toys. This period 
continued as long as the child cared to play. This period was per- 
mitted so that no undesirable effects of the frustrating experience 
would remain to affect the child’s behavior or personality after the 
conclusion of the observations. 

Observational Techniques. During each of the periods we have 
described an observer was present at a small table in a corner of the 
playroom. After leading the child into the room and doing what was 
Necessary to “set the stage,” the observer sat at the table “to do 
his lessons.” While “doing his lessons,” he made notes concerning 


the child’s behavior. He Participated as little as possible in the child’s 
play activity, but he gave brief answers to any questions which the 
child put to him. 


In addition to the observer 
observer was stationed behin 
observer made a running 


the free play situation 


we have alre 


ady mentioned, a second 
da one-way 


vision screen. This second 
child’s behavior and re- 


i seconds so that the time 
of all observations could be accurately determined. eves was B 


switch underneath the first observer's table so that he could activate 
one of the pens of the polygraph. This was used to indicate the 


beginning and end of any event in which he was particularly 
interested. 


m 
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Barker, Dembo, and Lewin list the following advantages of having 
two observers: 


1. When the first observer was actively occupied with the child or toys, the second 


observer’s record was available. 
2. The behavior of the first observer could be recorded by the second observer. 


3. The presence of two observers made it possible for each one to concentrate 
on different aspects of behavior. This made possible the collection of more and also 
areful observations. The first observer emphasized the activities 


better and more c 
the second observer emphasized the conversation and general 


of the child while 


meaning of what was happening. 
4. The availability of two observers permitted their roles to be interchanged. 


This minimized the effects caused by the biases, attitudes, and personality charac- 
teristics of each observer. 


Measuring Regression. The second problem, which Barker, 


Dembo, and Lewin tackled was that of derita, method for the 
measurement of regression.) The records available‘for analysis con- 
sisted of the running accounts of the two observers. These were 
combined into one integrated account, and this account was sub- 
jected to study and analysis. The first step in this analysis was that 
of dividing up the complete running account into meaningful “units 
of action.” This meant that each record had to be examined for a 


complete or fairly complete unit of play and divided into as many 
units of action as there appeared to be meaningful and relatively 
complete play sequences. An example of such a unit 1s as follows: 


The child picks up the teddy bear and pulls the truck and trailer. Hauls the doll, 
the phone, and the teapot. “Teddy bear, teddy bear, you stay right here. ‘She 
shows off, talks, and looks at observer. Pushes truck and trailer into middle of the 
room, makes a noise, “rrrr.” “Oh, teddy! You are going to sleep.” The load falls 
off the truck and trailer. Reloads teddy bear and doll, whispering. 


Barker, Dembo, and Lewin classified these units of action into 
Occupation with accessible goals and occupation with inaccessible 
goals. In the first instance, they made note of the actions which the 
child actually performed, and in the second instance, they made 
note of his attempts to get at the inaccessible toys. E 

Under Category L, occupation with accessible goals, Barker, 


Dembo, and Lewin distinguished five types of behavior: 


1. Playing with accessible toys | y 
2. Diversions with non-toy objects: acti 


window, and so forth 


vities with experimenter, activities at 
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3. Island behavior: playing with objects not intended to be part of the experi- 
mental set-up, such as stray pins found on the floor 
4. Looking and wandering about 


5. Disturbances and activities created by outside noise, and so forth 


Under Category 2, occupation with inaccessible toys, Barker, 
Dembo, and Lewin noted the following types of activity. 

1. Physical approaches to the inaccessible regions, kicking the floor, and so 
forth 


2. Social attempts by means of threats, pleadings, requests, coaxing, and so 
forth 


3. Passive directed actions such as looking at or talking about inaccessible toys 


Barker, Dembo, and Lewin now 
units from the standpoint of their 
to do this because they felt that co 


taken as a sensitive indicator of reg 
in view of their premises that 


wished to evaluate these play 
constructiveness. They desired 
nstructiveness of play could be. 
ressive behavior. They felt this 
a child’s entire mind, personality, and 
behavior are involved or reflected in his free play activity and that 
fantasy and realistic judgment are closely interwoven in any con- 


structive action. Barker, Dembo, and Lewin describe their evaluation 
procedure as follows: 


their increasing constructiveness. No 
The resulting order represented the 
cussion, disagreement, and compron 
a priori theories of “constructiveness,” it was possible to agree upon the relative 
ranking of different play with the same toys. 

The play units were briefly characterized 
tabular form. . . . Each rank order was ass 
final scale ranged from 2to8... . The records of the six children were then scored 
by assigning a numerical value to each Consecutive play unit in the record in 
accordance with the rating given in the scale, weighted for the duration of the unit 
by multiplying by the time. The mean constructiveness of each child’s play was 
determined by summing these values for the whole record and dividing by the total 
duration of play... . 

Using the constructiveness scale as thus 
remaining records. The items of the original 
units which occurred. However, it was iney: 
should occur in the other records. When thi 
the new unit and agreed upon its placemen 


and the characterizations set down in 
igned a numerical weight which in the 


devised, one of the raters scored the 
scale covered the great majority of the 
itable that a number of unrated units 


J happened, the three raters considered 
tin the scale. 
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The observers made their ratings on the assumption that a con- 
structiveness of play continuum would extend from primitive, 
simple, little-structured activity to elaborate, imaginative, and 
highly developed play. Seven levels on this scale were ultimately 


differentiated. They are as follows: 


ned superficially. (a) Sits on floor and takes truck 


Constructiveness 2. Toys exami 
(b) Shakes iron once, teddy bear once, holds truck 


and trailer in hand, 10 seconds; 


in hand, holds truck fingering it, 20 seconds. 
Constructiveness 3. The truck is moved to a definite place or from one place to 


another. (a) Phone, truck, and trailer, manipulated and carried to window sill, 
25 seconds; (6) Bends over to truck and trailer, pushes back and forth, 15 seconds. 

Constructiveness 4. This is a somewhat more complicated manipulation of the 
truck. (a) Truck and trailer backed under chair, 15 seconds; (4) Stands up. Picks 
up truck and trailer, detaches. Takes truck in hand, examines closely. 70 seconds. 
Constructiveness 5. This is a definitely more complicated and elaborated manipula- 
tion of the truck. (a) Truck and trailer unloaded, detached; pulled in circles, reat- 
tached, detached, reattached; pulled in circles. 45 seconds. (4) takes doll, puts on 
truck and trailer, “He doesn’t sit up very well.” “I lay the teddy down.” They are 
both lying down on trailer as trailer is pushed back and forth. 

Constructiveness 6. The truck is used as a means to haul other things. (4) Takes 


truck and trailer. “More things are going to be hauled.” Puts cup, saucer, teapot 
on trailer, Talks to self. “Ride along, mister.” To square 3. 60 seconds. (4) “This 
is a fire truck.” To middle of room. ‘Around in middle. “You can load things in it. 
Mr. Duck! PII haul Mr. Duck.” 45 seconds. 
Constructiveness 7. The meaning of the pl 
elaborated story in which the handling of t 
setting. (a) “Here’s a car-truck, and it’s going out fishing, so we have to take the 
trailer off. First, we have to go to the gas station. Toot! Toot! Now, he’s going to 
the gas station. Ding, ding, ding.” Gets gas. Now back for the trailer and the fish 
pole; child has truck and takes the motor boat. Attaches it to the truck and trailer. 
Hmmmmmmm! Here he goes.” Behind square 2 to 1. “Quack! Quack! Mr. 
Ducky come,” (places on truck and trailer). Goes to 3. “Here’s the sailboat.” 
225 seconds. (4) “I want the teddy bear to sleep. Where will be the bed for the 
teddy bear?” Chooses the truck and trailer. “Now you go to sleep. We are going to 
Minneapolis.” Puts teddy on trailer. “You can’t go, Mr. Dolly, Teddy bear goes.” 
Subject lies down on the floor and looks at teddy bear on trailer. “Toot, toot. å 
Pushes truck and trailer to barrier, then pulls back. Plays with truck and trailer 
and teddy. “Teddy bear, you will sit in the back.” Pushes truck to table. “We're 
Boing to Chicago.” Gets crayon. “I want some crayon to go.” 175 seconds. 
Constructiveness 8. Play showing more than usual originality is classified here. (a) 
To square 1. Truck and trailer reattached. “I'll bring them here.” Detaches truck, 


has it coast down trailer as in incline, reattaches. 30 seconds. (4) To truck and trailer 
s incline against ironing board. Runs truck 


at square 1. Detaches trailer, uses it a : i 
Up, carries it up further and further, and lets it go. Looks to experimenter for 


ay is an extensive, “trip” or another 
he truck is merely a part of a larger 
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roval, smiling, “ Did you see it? Now watch it.” Pushes truck across floor, big 
E Hits E. “See how fast it goes!” “Chugs” it over to observer’s window, looks 
eA “ Chugs” to table, to barrier. 205 seconds. 


Reliability. Barker, Dembo, and Lewin determined the reliability 
of their constructiveness ratings in two ways. One way required 
the dividing of the thirty-minute observation period into three 
periods of ten minutes each. Then a constructiveness score was 
assigned for each one of the three ten-minute intervals, and the 


intercorrelations presented in Table 173 were computed. 
Taste 173, Reliability Data for Barker, Dembo, and Lewin’s Constructiveness Ratings* 
First and second periods........... 72 
Second and third periods.......... AS 
First and third periods. 
* From Barker, R., Dembo, T., and Lewin, K. Frustration and regression, An experiment 
with young children. Univ. Ja. Stud. Child Welf. 1941, 18, No. 1. 


These correlations are not rem 
and Lewin concluded that the 
required that units of pl 
those that lasted from 1 
61 to 90 seconds, and fr 


arkably high, but Barker, Dembo, 
y were satisfactory. The second method 
ay be classified as to time involved. Then 
to 15 seconds, from 31 to 45 seconds, from 
om 121 to 180 seconds were selected as one 
series. Those that lasted from 16 to 30 seconds, from 46 to 60 seconds, 
from 91 to 120 seconds, and 181 seconds or more were put into a 
second series. Then the correlation between the constructiveness 
indices for each of these two series was computed. It was found to be 
.79. Barker, Dembo, and Lewin entered this value in the Spearman- 


Brown Prophecy Formula and found a value of .88 for the reliability 
of the entire series of ratings. 


Barker, Dembo, and Lewin are careful to point out that construc- 
tiveness of play could easily have varied from one of the above 


periods to another and that this should not be taken as proof that 
the ratings were unreliable, They also point out, however, that they 
were concerned with assigning a constructiveness level to each child 
so that they could use this constructiveness score as an indicator 
to determine regression. This being the case, it became appropriate 
for them to determine the self-reliability of the constructiveness 
score for an entire observational period. 

Validity. Barker, Dembo, and Lewin’s constructiveness scale 
possesses validity by definition only. They developed their scale, as 
Landis did his, upon the basis of the empirical distinctions their 


Performance: Observational A, [pproaches 377 


data afforded. Therefore, they could not check its validity by 
correlating the constructiveness ratings with results secured by other 
methods of measuring constructiveness of play, for no other meas- 


ures were available. 
While our primary interest Is 1n Barker, Dembo, and Lewin’s 


methodology, we cannot leave their experiment without some 


comment as to their findings. These are that frustration decreases 
of play with accessible toys and that 


the average constructiveness 
ends upon the strength of the frustra- 


the amount of regression dep 


tion involved. 
We can see in this experiment de 
theoretical formulation of a problem. This framework was responsi- 


ble for the design of the experiment, and this provided for the 
collection of data appropriate to testing the hypothesis in question. 
Barker, Dembo, and Lewin were well aware of the mistakes to be 
avoided in making observations. Therefore, they provided for two 
observers. They also made earnest attempts to see that all assess- 
ments, observations, evaluations, and ratings possessed a sufficient 
degree of objectivity, reliability, and validity to give their conclu- 


sions substantial meaning. 


finite value in starting with a 


OBSERVATIONS OF GROUPS OF INDIVIDUALS 


a our attention to the observation of groups of 


7 
We can now turr 
of group observation 


individuals. The outstanding example of the use 
for the prediction of individual behavior is that exemplified in the 
Program of the Assessment Staff of the Office of Strategic Services. 

The Office of Strategic Services was established during World 
War II by the President of the United States and by Congress. Its 
Purposes were to set up and maintain research units in the United 
States and overseas, to establish and maintain a network of agents 
for gathering information concerning the nation’s enemies, and to 
conduct destructive operations behind enemy lines. The Assessment 
Staff was charged with the responsibility of selecting the personnel 
who were to be instrumental in helping the Office of Strategic Serv- 
ices achieve these objectives- This meant that the Assessment Staff 
had to develop a series of procedures which would reveal the poten- 
tialities of the candidates for assignment to positions in the Office 
of Strategic Services. As in all other selection programs, the basic 


378 Personality Measurement 


purpose was to increase the number of successes and to decrease ane 
number of failures. The procedures developed by the OSS Assess 
ment Staff are described in a volume called Assessment of Men. We 
shall review this work here, even though it leaves much to be desired 
in the way of making a really significant contribution to the predic- 
tion of individual behavior. p 
The authors of Assessment of Men describe their work as a “ multi- 
form organismic approach.” They say it is “multiform because it 
consists of a rather large number of procedures based on different 
principles,” and that it is “organismic because it utilizes the data 
obtained through these multiform procedures . . . to arrive at a 
picture of personality as a whole.” The authors contrast their multi- 
form organismic approach with what they call the elementalistic 
approach. This latter approach “calls,” they say, “for . 
tative measurements of partial isolated processes,” 


organismic approach” requires estimates 
rocesses.”” For the organicist 
g > 


ag quanti- 
while “the 
“of total integrated 
“personality is not a series of per- 
ceptible facts, but... a hypothetical formulation, the aim of which 
is to explain and to predict the perceptible facts.” Consequently, the 
method which the organicist Supports “is that of predicting the 
future by thinking inductively from an observed set of facts to a 
conception (a hypothetical formulation of the personality)” and 
then “to think deductively from this conception to the facts which 
should be expected.” Thus, “organismic assessment is based on the 
hypothesis that a trained psychologist or psychiatrist... is e- 
capable of improving to a significant degree the accuracy of me- 
chanical predictions derived from test scores alone.” 
Guided by Henry A. M 
gestaltism, the OSS Assessment Staff deve a 
of assessment procedures. These we shall set forth in some detail, 
but we must point out, if indeed it will not become self-evident, that 
the great majority of these procedures were novel, new, and untried. 


technique. 


we describe the assessment prog 
It profitable to review the steps 


tself says should be followed in 
e to 


cedures themselves, we shall find 
„which the OSS Assessment Staff į 
their development. These steps ar 
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nalysis of all jobs for which candidates are to be assessed. 


. Make a preparatory a 
and failure in each job. 


List all the personality determinants of success 


| Select the variables to be measured. 
. Construct a rating scale for each of the personality variables to be assessed. 


. Design a program of assessment procedures which will reveal the strength of 


the selected variables. 
a. Plant the assessment proce 
b. Select several different type 
same type for estimating the strengt 
c. Include situational tests. 
6. Formulate the personality of each 
predictions, and recommendations. Acquire a C 
before assessing each variable. 


7. Write a personality sketch of each assessee. 
8. Hold staff conferences for the purpose of reviewing and correcting the personal- 


ity sketch and of deciding upon ratings and recommendations of each assessee. 

9. Construct experimental designs as frames for assessment procedures so that 
all data necessary for the solution of strategic problems will be systematically ob- 
tained and recorded. Set up situations that will reflect to what degree the assessee 


possesses each of the several variables. 


voa u Ne 


dures in a social matrix. 
s of procedures and several procedures of the 


h of each variable. 


assessee before making specific ratings, 
oncept of the individual as a whole 


It becomes our duty to report, in evaluating this ambitious study, 
that the OSS Assessment Staff did not arrange its procedures to 
meet its own requirements. For example, no preparatory job analyses 
were made. This for the reason that great secrecy surrounded all 
field operations in the Office of Strategic Services, and apparently 
the nature of the duties for the jobs for which candidates were to be 
recruited could not be made available to the Assessment Staff. 

Job analyses are frequently overrated as to the amount of help 
they can yield in a selection-research program. Nevertheless, the 
OSS Assessment Staff indicates that such analyses are basic to an 
assessment program. This being the case, its proper course of action 
would have been to state to the proper executive and administrative 
officers that it could not proceed unless the necessary information 
were supplied. But instead of doing this, the OSS Assessment Staff 
disregarded its own first principle and proceeded to its second, 
the listing of all personality determinants of success and failure for 
each type of job for which candidates were to be assessed. 

The OSS Assessment Staff states that such personality deter- 
minants will differ from job to job- Therefore they should be sepa- 
rately determined for each job. But not knowing what jobs were, 
involved and not knowing whether they differed from each other, the 
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OSS Assessment Staff proceeded to disregard its second principle 
and prepared a standard list of personality traits. It implies that 
this list of traits is important in determining success or failure on 
jobs for which the duties and requirements were almost completely 
unknown. The variables which the OSS Assessment Staff chose to 
assess are as follows: 


. Motivation for assignment 
. Energy and initiative 

. Effective intelligence 

. Emotional stability 

. Social relations 

. Leadership 

. Security 


NAW Pwd 


Like the fortuneteller in telling a fortune, the OSS Assessment 
Staff picked out good traits. These are socially acceptable and are 
popularly supposed to be important for almost any job of conse- 
quence. This does not prove, however, that they are important. 
Having selected the personality variables to be assessed, a rating 
scale for each trait was needed. This turned out to be one scale, and 
it was used for all traits. It provided for six degrees of variation: 
very superior, superior, high average, low average, inferior, and 
very inferior. We have already described this scale in some detail in 


Chap. 10. 


We need not go into great detail in describing the OSS Assessment 


procedures, but we shall give a list of them, with brief explanatory 
comments. 


1. Sentence completion test. This consisted of the beginnings of 100 sentences. The 
task set for the candidate was that of completing these sentences in as rapid a 
manner as he could. The areas covered by the sentences were “ (a) family, (2) the 
past, (c) drives, (d) inner states, (e) goals, (f) likes and dislikes, (g) energy, (4) re- 
action to frustration and failure, (i) time perspective, (j) optimism-pessimism, 
reaction to others, and (/) reactions of others,” 

2. Health questionnaire. Typical type. 

3. Work conditions survey. A list of 43 conditions which can exist in a job. Candi- 
date was required to rate each condition on a 6-point scale indicating how accept- 
able or unacceptable it would be to him. , 


4. Vocabulary test. Fifty multiple- 


z js £ choice items taken from the American Council 
on Education Psychological Examination, the Atwood-Wells Wide R 


3 ange Vocabu- 
lary Test, and the CAVD (The Institute of Educational Research Intelligence 
Scale). 


5. Personal history form. Typical type. 


Performance: Observational Approaches 381 


6. Projective questionnaire. Twelve questions such as “What was the greatest 


lack in your childhood ?” “What things or situations are you most afraid of?” and 


so forth. 
est. Thirty-six questions concerning the character of a man. The 


7. Belongings t 
” z F i és 
nswers to these questions were to be inferred from the nature of 26 items left in 
en four minutes to examine. 


the man’s bedroom which the candidate was giv 

8. Terrain test. A test of the extent to which the candidate was able to reproduce 
from memory the various points depicted on a map of the surrounding country and 
the location of the buildings and of his ability to infer the history of the farm from 
his observations. The time allowed for the observations and for study of the map 
extended from one day to the next, but the candidate had to fit this as best he could 


into a crowded schedule. 

9. The brook test. Four to seven candidates, as a group, were instructed to trans- 
port certain objects across i brook. This required their utili 
ropes to build a bridge or of pulleys, trees, and ropes to form a cable or swing. The 
object was to test leadership and methods of procedure, neither of which items 


were covered by specific instructions. 
10. The wall problem. Four to seven candidates, as 2 group, were led to a wall 


10 feet high and 15 feet long. They were told that the only way to escape (from the 
Japanese) past this barrier was over the top (they could not go or look around it). 
Furthermore there was an additional wall behind the first, and between the two 
walls there was a 200-foot canyon. Their task, in addition to getting themselves 
past the two barriers, was to transport a heavy log (their king-size bazooka) over 
with them. Scattered about and available for use (if the candidates so chose) were 
one board and two two-by-fours. The board was @ little longer than the log, and 


the two-by-fours were 2 feet and 3 feet in length. 


11. Construction test. The task set for the candidate wi 
workmen in the construction of a giant Tinker Toy cube. 
— a that the candidate specifically directed them to 

andidate, were to obstruct progress in every way pos 

allowed. 

sae rame interview. 

akik 5 ered by the construction test, 
o complete. 

13. The interview. A typical type of clinical interview. 

14. The OWI test. The candidate was told he was to have char; 
activities in Korea to win Koreans over to our side. In twenty minutes he was to 
tell “what information he would want to have” in order to carry on @ successful 
campaign. The object was to find out if the candidate was sensitive to items of 


Korean culture. 
15. Map memory test. The candidate was given eight minutes to study a map, and 


then had to answer 30 multiple-choice questions 
16. Mechanical comprehension test. The Bennett 


17. Manchuria test. The candidate was instructs 
a two-minute 200-word spot radio broadcast designe 
workers and guards on the South Manchuria Railway. 


zation of boards and 


as that of directing two 
The two workmen were 
do but, unknown to the 
sible. Ten minutes was 


cause of the stress 


Intended to be therapeutic be 
andidate was ever 


which, incidentally, no ¢ 


ge of propaganda 


concerning it. 
Mechanical Comprehension Test. 


d to prepare (a) a leaflet and (2) 
d to lower the morale of the 
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18. Discussion. Candidates, as a group, were instructed to discuss for hee 
minutes the major postwar problems facing the United States and, if time permitted, 
the lines along which these problems should be solved. f . ae 

19. Interrogation test. The candidate was given twelve minutes to interview i 
“tail gunner” recently escaped from a Japanese prison camp. He was to “ ou 
(a) the location of the camp, (4) how prisoners were treated, (c) the size of the camp, 
and (d) any other intelligence deserving consideration. ; 

20. Stress interview. The candidate was allowed twelve minutes to invent a 
cover story for the “fact” that he had been discovered going through some secret 
papers in a government office (of which he was not an employee) in Washington. 
He was then interrogated in rapid-fire, third-degree fashion for ten minutes, was 


told that he had not been telling the truth and that he had failed the test, and was 
then dismissed. 


21. Poststress interview. To some extent therapeutic, but it also consisted of an 


attempt to have the candidate break the security regul 
been placed at the beginning of the three-d 

22. Six-2 test. The candid 
on the interrogation of Chi 
captain’s report, (c) an 
territory, a 
given thirt 
for enemy 


ations under which he had 
ay testing period. 
ate was given a map and four documents: (a) a report 
nese refugees, (4) an English translation of a Japanese 
American lieutenant’s report on conditions in occupied 
nd (d) a translation of a Chinese military document. The candidate was 
y-five minutes to make a list of all items of information having relevance 
action in a designated area and to classify 
with respect to their probable truth or fa 
prepare a 50-word dispatch for tr: 
23. Teaching test. The task set 
of a time-pencil: a device similar 


these items of information 
lsity. He was then given ten minutes to 
‘ansmission to headquarters. 

was that of explaining the construction and use 


to a delayed-rection fuse used in demolition work. 
The candidate was given a typed 400-word description of the device, assembled 


and unassembled samples, and diagrams, He was to prepare a talk lasting not 
more than fifteen minutes which would make clear both the construction and the 
use of the time-pencil. 


24. Assigned leadership. Candidates. 
had been designated as a leader, were instructed to solve five field problems. The i 
nature of these problems can be illustr. 
them, the mined road, to the leader of idate groups. 

“Joe, you’re the leader in this situation. Yow’re leading this group back from @ 
mission in enemy-occupied territory. You and your men have blown up a bridge 
about a mile away in this direction (points behind them). According to prearranged 
plans, you must meet a guerilla truck about a mile away from here in this direction 
at a time which leaves you only t spend getting across this road. 
You have discovered that this road h ed with a new type of sensitive 
dig up. The limits of the problem 


» in groups of five or more, one of whom 


‘ 5 enemy, but so far they don’t know in which 
direction you have gone. Your problem is obvious: to get across and leave as little 
trace of your escape route as possible.” 
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There were scattered about for use by the group, if it chose to use them, two logs 
(12 and 14 feet), some stones, & few 4-foot two-by-fours, and two lengths of rope. 
Names and Identifications. This test consisted of two sets of 12 slides. Each 
ained (a) a man’s photograph, (è) his name, (c) age, (d) 
occupation, and (e) residence, city, and state. The second set had only the photo- 
graphs. The slides were shown in sets of four, each slide being exposed for twenty 
seconds. The corresponding slides in the second set were then exposed for thirty 
seconds, and the candidate was to reproduce correctly for each man all data con- 


tained in the first slide. 
26. Movie observation test. This consisted of two motion-picture sequences lasting 
ach. The task of the candidate was to observe 


about two and one-half minutes e: 
the actions of two men, after they had been identified, and then later to answer 


a set of questions about their actions. 


27. Code aptitude test. The Signal Corps Code Aptitude Test. It consists of 78 
pairs of patterns of code sounds. The task of the candidate was to indicate whether 


these patterns are the same or different. 

28. Obstacle Course. This included such tasks 
scaling a hedge by means o! 
oss narrow (314 inch) 
Time Course. 
psychodramatic test. Each candidate could 
choose a role he wished to play from any one of nine previously prepared manu- 
Scripts. Then with another candidate or staff member he had to act out the situa- 
tion portrayed in the manuscript. One of the situations was as follows: 

Mr. F of this organization has been working as an administrative assistant for 
about two months. He feels that he has been doing a good job. His superior, Mr. G, 
however, is so dissatisfied with the work of his assistant that he decides to call him 
into his office. The scene to be enacted is the conversation between Mr. F and 


Mr. G. 

30. Debate. An informal panel d 
fighting for?” and “What shall we do with 
Candidates were divided into opposing groups, 
Severity was favored, and were given twenty minu 
prepare arguments. Debate was carried on in an in 


encouraged. 
31. Sociometric questionnaire. The candidate was asked to answer a series of 
questions concerning other candidates. He was, for example, asked to list by num- 
ber those candidates whom he would recommend as supervisors of a group dealing 
with the problems of planning and organization. Each candidate could name as 
many or as few other candidates as he wished. There were 11 questions. 

32. Fudgment of others. The candidate was asked to write personality or character 


sketches of five of the other candidates. d 

33. Murder mystery- The candidate was given a copy of the mythical “Fairfield 
Chronicle,” detailing the discovery of the dead body of a woman- The candidate 
was asked to decide whether the presumption of suicide, as given in the paper, was 
correct. If the candidate decided it was not and if he felt that foul play had been 


slide in the first set cont: 


as rings, hand-over-hand progres- 
f variously inclined boards, 
and high (8 or 10 feet) 


sion on a horizontal rope, 
scaling a smooth wall, getting acr 
catwalks, and the Army’s Running 

29. Improvisations. A Moreno-type 


iscussion of such questions as “What are we 
Germany (or Japan) after the war?” 
according to whether leniency or 
tes to select a chairman and to 
formal manner. Drinking was 
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involved, he had to decide who committed the murder. As help, he cant — 
certain designated staff members who presumably knew some of the aoe ir = 
nection with the case. When candidates had prepared their solution, they w 
presented in a “court” session called for the purpose. 

34. Athletic events. Broad jump, high jump, and shot-put. 


35. Baseball game. A frivolous competition between staff and candidates on 
opposing teams. 


Analysis. It is difficult to describe the methods of analysis used by 
the OSS Assessment Staff in assessing each of the personality vari- 
ables by means of the foregoing tests. Indeed the authors of //ssess- 
ment of Men state that they are not entirely sure themselves as to 
the exact nature of the mental processes involved in making any 
given assessment. The nearest we can come to giving some insight 
into these processes, whatever they may have been, is to show how 
the variable “social relations” was evaluated. 


First, let us note that “social relations” encompasses in its 


definition the ability to get along well with oth 
team play, t 


traits, and s 


er people, good will, 
act, freedom from disturbing prejudices and annoying 
o forth. It can be assessed, according to the OSS Assess- 
ment Staff, by interview, by informal observation, by individual 
test situations (construction and improvisations, for example), by 
group task situations, by projective tests, and by sociometric 
questionnaire. 


The procedures followed by the OSS Assessment Staff c 
them the implicit assumption th 


as a basis for diagnosing the stre 


arry with 
at each test or situation is useful 


$ ngth of several different traits. For 
example, in the brook test it was felt that a candidate who offered 


a suggestion had “energy and initiative”; if his suggestion was 


relevant, he had “effective intelligence”; if the candidate accepted, 
in a good-natured manner, the rejection of his suggestion, he had 
good “social relations” 


; and if his suggestion was adopted and if it 
worked, he had “leadership ability.” Thus the strength of each 
personality variable was to be assessed upon the basis of several 
different tests and situations, and each test and situation was tO 
veral different personality variables. 
sessment were as follows: Following 
ho were staff members, would make 
each other. They would then meet in 


One joint rating to be submitted for 


contribute to the analysis of se 

The steps in making each as 
each situation the observers, w 
their ratings independently of 
conference and decide upon 
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later consi i 
on 7 i i isati 
sideration. The staff would meet m a postimprovisations 


c e č i 
onference, discuss the performance of the candidates during im 


provisati joi isi i 
F sation, and come to some joint decision concerning the ratings 


to be assigned. A personality sketch was written, in part by the 
interviewer and in part by the situationist, 7.2.5 the staff member 
who observed the candidate in the various test situations. A staff 
conference was held. At this conference the sketch just mentioned 
would be read, discussed, amended, and corrected. Then a decision 
would be made as to whether the candidate was acceptable or un- 
acceptable. Following this decision the staff proceeded, in conference. 
to give final ratings on all personality variables. i 
; Situational Tests. Most of the tests used by the OSS Assessment 
Staff were situational in nature. We shall do well, therefore, to 
acquaint ourselves with the specialized requirements for such testing. 
As formulated by the OSS Assessment Staff, these requirements are 


as follows: 


Š hoe task should have a number of solutions. This is exemplified in the brook 
arte at could be solved (a) by building a bridge, (4) by using a rope and pulley, 
7 vy lassoing a tree and construct acility with which a solution 
i selected and acted upon can then be observed. 

2. The task should not require specialized abilities. This requirement undoubtedly 
st’s philosophy that he is interested in personality as a 
of the components of personality which 


ing a swing. The fi 


stems from the organici 
whole and thus wants to tap those aspects 
tend to color or influence the whole of personality rather than just one aspect of it. 
a a should be designed to reveal kinds of behavior which cannot be registered 
conse ‘ts a his requirement does not seem necessary. It would be expensive, of 
of all ee there is no intrinsic reason preventing the photographing and recording 
silent pi om any observer can see or hear. In fact it might have proved a 
recorded le or the OSS Assessment Staff to have photographed and to have 
lepori all situations in which candidates were observed. This would have made 

ssible for other observers to attempt an evaluation of the behavior and through 
as to their methods of analysis. 
didate to reveal a dominant disposition of his 
hilosophy of the organi- 
ingle 


thi ; e 
1s experience to provide clues 


4. Each task should force the can 
Personality. This requirement is obviously rooted in the p! 
cist’s approach in desiring to assess the whole of personality rather than some S$ 


component of it. 


5. Each task should, if possible, 


asserts that such tasks are more productive tl 
interaction, This assertion is rather hard to reconcile with another to the effect 


that the interview is the most revealing single test OF situation that has been tried. 
A 6. The task should require the coordination of numerous components of personality. 
gain, a requirement rooted deep in the philosophy of the organicist’s approach. 


1. The OSS Assessment Staff 


require group interaction 
not require such 


han tasks which do 
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7. Each task should be modified to fit the experience and abilities of the candidate. 
While this at first might seem a desirable requirement, its constant application 
would defeat the experimental nature of the test situation. After all, one of the asin 
requirements for making predictions is that we submit all candidates to the same 
tests or situations and see how they react differentially. Only if we do this can we 
be sure of knowing whether one candidate is better or worse than or equal to some 
other candidate. F 

8. Candidates should discuss performance after each situation. This seems a desir- 
able and reasonable requirement. Emotional catharsis is needed after some of the 
more stressful situations to which the candidate is subjected. 


9. The members of the staff should have time to confer with each other concerning the 
results of the assessment. Obviously! 


The type of data which a situational test makes available for 
analysis can best be illustrated by the protocol secured in testing a 
candidate in the construction test. This protocol is given in Table 
174 and was secured in response to the following directions: 


We have a construction problem for you now. We want you to build a structure 
using the equipment lying around here. Let’s see. (The staff member appears: to 
ponder which of two or three models of different design to use.) I guess we'll give 
you this model to copy. (Staff member picks up the model which is always used 
from among the others and shows it to the candidate.) 


5-foot and long 7-foot poles lying on the ground. (Staff member points out one of 
each size.) The sides of the frame which you are to b 


A uild are made of 5-foot poles, 
and the diagonals of 7-foot poles. (Staff member demonstrates this on the model.) 
Do you understand? 


You see there are short 


Now the corners where the poles come together are made like this. You take 
a half block and put it through a full block. Then you cinch it with a peg, like this. 


(Staff member demonstrates all this.) Then when you put the corner down on the 
ground, you can put the 5-foot poles i 


n here, here, and here, and the 7-foot diagonals 

here and here. Do you understand? 
Now (Staff member picks up the cor: 
there are holes for pegs like this at eac 


ocket, to cinch it with a peg, because 
ill not be stable. (Staff member then 
his all clear? 

Š em ore important than that, it is a test of 
leadership. I say that because it is impossible for one man working alone to complete 
this task in the ten minutes allotted to do it, Therefore we are going to give you 
ou are to be the supervisor, their boss- 
: s foreman, you will follow more or less 
of a hands-off policy. Let them do the manual labor. You can assume that they 
have never done such work before and k 


Just ten minutes in which to do the job- 
[’ll call your two helpers. 
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Tase 174. Sample Protocol for the Construction Test * 


Can you come out here and help this man for a 


Srarr Memser (calling toward the barn).— 


few minutes? 
Buster asp Kippy.—Sure, we 
Starr Memper.—O.k. Slim, 
ten minutes. 
Stim.—Do you men know anything about building this thing? 
Busrer.—Well, I dunno, I’ve seen people working here. What is it you want done? 
Siim.—Well, we have got to build a cube like this and we only have a short time in which to 
do it, so I’ll ask you men to pay attention to what I have to say. I'll tell you what to do and 
you will do it. O.K.? 
Busrer.—Sure, sure, anything you say, Boss. 
Srım.—Fine. Now we are going to build a cube 
- and 7-foot poles for the diagonals, and use the blocks for the 
the corners by putting a half block and a whole block together like t! 
a peg. Do you see how it is done? 
Busrer.—Sure, sure. 
Siim.—Well, let's get going. 
Er.—Well, what is it you want done, exactly? What do I do first? 
need four on the bottom and four 


Well, first put some corners together—let's see, We 
ght corners. You make eight of these corners and be sure that you pin 


'Il be right out. 


these are your men. They will be your helpers. You have 


like this with 5-foot poles for the uprights 
corners. So first we must build 
his and cinching them with 


s, we need el 
them like this one. 
Buster.—You mean we both make eight corn 
Suint.—You each make four of them. 
Busrer.—Well, if we do that, we will have more than eight because you already have one 
made there. Do you want eight altogether or nine altogether? 
Suim.—Well, it doesn’t matter. You each make four of these and hurry. 
Buster,—O.K., O.K. 
Kippy,—What cha in, the Nav 
are after, What cha in, the Navy? 
Sum.—Er, no. I am not in the Navy. I’m not in anything. . 
x Kippy,—Well, you were just talking about “topside” so I thought maybe you were 1n the 
avy. What’s the matter with you—you look healthy enough. Are you a draft dodger? 
Siim.—No, I was deferred for essential work—but that makes no difference. Let's get the 
work done. Now we have the corners done, let’s put them together with the poles. 
a siege more I think of it, the more I think you are in the Army. You run this job just 
ike the Army—you know, the right way, the wrong way, and the Army way- T'I] bet you are 
Some second licutenant from Fort Benning. 
Suim.—That has nothing to do with this job. Let’s have less talk and more work. 
Kippy,—Well, 1 just thought we could talk while we work—it’s more pleasant. 
Suim.—Well, we can work first and talk afterward. Now connect those two corn 


5-foot pole. 
Busrer.—Don’t y 
Stim.—That’s a good idea. 
Busrer.—What kind of work did yo 

bet. Jeez, I’ve seen a lot of guys, but no one a 
Suim.—Well, that may be, but you! don’t seem to 
Busrer.—What—what’s that? Who are you talking to, m 


ers or just one of us? 


y? You look like one of them curly-headed boys all the girls 


ers with a 


‘ou think we ought to clear a place where we can work? 
Sure, go ahead. 

u do before you came hei 
s dumb as you. 
be doing much to help me. 

e? Me not being helpful—why, 


re? Never did any building. 1 


York: Rinehart & Company, Inc., 1948. 


* 
From Assessment of Men. New 
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Tage 174. Sample Protocol for the Construction Test (Continued) 


I’ve done everything you have asked me, haven’t 1? Now, haven't 1? Everything you asked 
me. Why, I’ve been about as helpful as anyone could be around ere. . cit 
Srm — Well, you haven’t killed yourself working and we haven’t much time, so let's ge 
Swe, I like that. I come out here and do everything you ask me to do. You don't 
give very good directions. I don’t think you know what you are doing anywa y. No one alse 
ever complained about me not working. Now I want an apology for what you said about me. 

Srım.—0O.K., O.K., let’s forget it. I'll apologize. Let’s get going. We haven't much time. 
You build a square here and you build one over there. 

Bustrer.—Who you talking to—him or me? : á 

Kippy.—That’s right—how do you expect us to know which one you mean? Why don't 


` 5 “ ” à gp er 
you give us a number or something—call one of us “number one” and the other “numb 
two. aes 


Suim.—O.K. You are “one” and he is “two.” 


Busrer.—Now wait a minute. Just a minute. How do you expect to get along with people 
if you treat them like that? First we come out here and you don’t ask us our names. You call 
us “you.” Then we tell you about it, you give us numbers. How would you like that? How 
would you like to be called a number? You treat us just like another ¢-foot pole and then you 
expect us to break our necks working for you. I can see you never worked much with people. 

Stim.—I’m sorry, but we do not have much time and I thought wr 

Kippy.—Yes, you thought. Jeez, it doesn’t seem to me that you ever did much thinking 
about anything. First you don’t ask our names as any stupid guy would who was courtcous. 
Then you don’t know what you did before you came here or whether you are in the Army, 
Navy or not, and it’s darn sure you don’t know anything 
workers. Cripes, man, you stand around here like 
What the hell is the matter with you, anyway? 

Suim.—I’m sorry—what are your names? 

Buster.—I'm Buster. 

Kippy.—Mine's Kippy. What is yours? 

Siim.—You can call me Slim. 

Buster.—Well, is that your name or isn’t it? 

Suim.—Yes, that is my name, 

Kippy.—It’s not a very good name—Dumbhe: 

Busrer.—Where do you come from, Slim? 

Siim.—Cincinnati. 


ER.—That’s out in Ohio, isn’t it? 


about building this thing or directing 
a ninny arguing when we should be working- 


ad would be better. 


Buster.—What's the river it’s on? 
Suim,—Uh—Why the Ohio. 


Buster.—You don’t sound very sure. I almost wonder if you do come from there. I’d think 
any Cincinnatian would remember the name of the river. ` 

Siim.—I’m from Cincinnati, all right. I lived there for cight years. 

Buster.—Down by the river? In the tenement district? i 

Sum.—No, in a residential region up to the north? 

Buster.—What street? 

Sum.—Why, 1490 Kingsbury Street. What does that ha 


ve to do with the present problem? 
Buster.—The reason I asked w 


as you don’t seem to be very well dressed, and 1 thought 
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Tase 174. Sample Protocol for the Construction Test (Continued) 


probably you hadn't made much of a success of your business and couldn’t live in a nice part 


of town. 

Siim.—Be that as it may—we'v 
talking and the time is passing rapidly. 

Busrer.—Well, what kind of a boss are you anyway? You haven't told me anything to do. 
You stand there and “get to work, get to work,” but you don’t say what I should do. 
Another thing, Kippy’s just sitting over there trying to make that pole stick into the dirt and 
you don’t make him work. You might at least treat us both the same. Why don’t you act like 
a boss? Why don’t you say, “Come here, Kippy, you good-for-nothing, and justify your exist- 
ence, Get some work done.” ? 

Siim.—Come on over, Kippys 
doing your part. Don’t you want to help? 

Kirpy.—Sure | do, but you haven't told me anything to do. 

Siam.—lI certainly did. I said to make some corners and you just went over there and sat 


e got to get back to work. You aren’t doing anything except 


he’s right. We all have to work together. You haven't been 


down. 

Kippy.—If that’s the way you're going to talk to me yelling and hollering and losing your 
temper—Just because you can’t give orders a fellow can understand, I don’t have to work for 
you. You've got to be decent. 

Srım.—Well, O.K. I'll show you exactly. I w 
bottom of this using a whole block and a half block pegged together with a peg like this. 

Kipry.—Well, why didn’t you so long ago? You sure wasted a lot of time. 

Busrer:—We've got to work faster. 

Srım.— That’s right, Buster. 

Busrer.—l suppose you know you're not very observant. 

Srım.—What do you mean? 

Busrer.—Sce those four holes in the ground? They're just 5 feet apart in a square, aren’t 
they? What does that bring to your mind? Could it be the place to lay the corners down on 
the ground to make them firm? You have your corners standing up on the rolling edges and 
that isn’t very stable. 

Suum.—It looks all right to me, if the four poles were 

Buster.—O.K., if you want to sacrifice stability for mobility, it’s up to you. But you might 
spirit in which it’s given. “I’m the boss,” you say, “ I’m bet- 
to listen to them. Even if they are 
ho’s in control around here.” 


ant you to help me make four corners for the 


put into the corners. 


at least accept a suggestion in the 
ter than those other guys. If I'm in charge I'm not going 
right, I won’t admit it, because I'm going to show them w 
Siim.—Well, we'll try it your way but I don’t think it’s necessary. 
Busrer.—Slim isn’t your real nickname, is it? It couldn't be with that shining head of 
yours. What do they call you, Baldy or Curly? Did you ever think of wearing a toupee? It 
would keep you from getting your scalp sunburned. 
Siim.—I don’t see what difference it makes. Come on, both of you, and put an upright in 
each corner, 
Kippy.—He’s sensitive about being bald. 
Busree—Yeah. . . Well, Captain, we don’t seem to be 
Siim.—Well, if you guys would get to work we would. 
Busrer.—Well, it seems to me it’s sorta late now. Why don’t you be 
you can’t do this job. After all, it’s only a toy and sort of foolish for a grown m 
to be ashamed of that you can’t build it. It's just not in your line. 
Srım.—Well, I'd like to do as much of this as possible. Will you help me? 


getting much done here, do we? 


a man and admit that 
an. It’s nothing 
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Taste 174. Sample Protocol for the Construction Test (Continued) 


Buster.—Sure, sure, we'll help you, but it doesn’t seem to be much use. What do you want 
us to do now? 

Stim.—Well, one of you build a square over there just like this one while the other one puts 
in the uprights and diagonals on this one. 

Kippy.—May | ask a question? 

S1im.—Sure, go ahead. 

Kirry.—Why build one over there? What are you going to do with it then? 

Suim.—Well, we'll put it on top—the top of this cube is like the bottom. 

Kippy.—Well, if that isn’t the most stupid thing I ever heard of. Since when do you build 
the roof of a house and lift it to the top? Why not build it right on the top? Listen, when you 
build a house you build the foundation, then the walls, and then the roof. Isn't that right? 

Suim.—Well, that is usually the way it’s done, but I think we can do this job this way. In 
fact, I don’t think it matters much which way we do it. Either way is O.K., I guess. e 

Busrer.—You guess, you guess. What kind of a man are you anyway? Why in hell don’t 
you make up your mind and stick to it? Be decisive—didn’t they tell you that in OCS—be 
decisive—even if you are wrong, be decisive, give an order. What are you’—man or mouse? 

Kirey.—Oh, it’s no use talking, Buster, when he doesn't have a bar on his shoulder he 
doesn’t know what to do. Listen, Mac, you’re not on Company Street now. You haven't a 


sergeant to do your work for you. You're all alone and you look pretty silly. Why, you can’t 
even put together a child’s toy. 


Stim.—Now, listen to me, you guys, are you going to work for me or aren’t you? 
Busrer.—Sure, we want to work fo! 


r you. We really don’t care. We'd as soon work for you 
as for anyone else. We get paid all the same. The trouble is we can’t find out what you want 
done. What exactly do you want? 

Suum.—Just let’s get this thing finished. We haven’t much more time. Hey there, you, be 
careful, you knocked that pole out deliberately. 
Kippy.—Who, me? Now listen to me, 


] you good for nothing young squirt. If this darned 
thing had been built right from the begin a 


$ ning the poles wouldn't come out. Weren’t you told 
that you have to pin these things? Why, none of it is pinned; look at that, and that, and that! 


(Kicks the poles which were not pinned out of position and part of the structure collapses.) 
Stim.—Hey—you don’t have to knock it all down. 


Busrer.—Well, it wasn’t built right. What good w: 

Suim.—I told you guys to pin it. 

$ Kippy.—I pinned every one you told me about. How did I know you wanted the others 
pinned? Jeez, they send a boy out here to do a man's job and when he can’t do it he’starts 


blaming his helpers. Who is responsible for this—you or me? Cripes, they must really be scrap- 
ing the bottom of the barrel now. n 


Starr Member (walking in from the sidelines), —. 
have. The men will take this down. 

Busrer.—Take what down? There’s nothing to t: 
done. 


as it without pins? 


All right, Slim. That is all the time we 


ake down. Never saw anyone get so little 


Reliability, Objectivity, and Validity. Let us now ex 
value of the OSS assessment procedures in ter 
objectivity, and validity. 

Reliability. We usually define reliabilit 
consistency, and usually as the type th 


amine the 
ms of their reliability, 


y as some type of self- 
at obtains between two 
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equivalent tests of the same psychological function. Very few tests 
in the OSS Assessment program allowed a determination of reli- 
ability in the usual way. 

For most of the tests reliability was determined from the inter- 
correlations found to obtain between the assessments of the same 
variable in different situations. For example, “energy and initia- 
tive” was assessed by interview, by the brook test, by the construc- 
tion test, by assigned leadership, by the obstacle course, by 
discussion, and by debate. ‘The median intercorrelation between 
assessed “energy and initiative” in all of these tests was 37, A 
complete summary for all variables is given in column 1 of Table 175. 


Data for the OSS Assessment Variables* 
T 


Tape 175. Reliability 


Average | Average 
Trait intra-trait | inter-trait 
| correlation correlation 
Physical ability. -eee -52 | .10 
Propaganda skills... ceesre eee | 47 | 30 
Leadership. . 41 | 45 
Energy and ini 37 | 
Emotional stability -j .30 iS 
Social relations... +--+ 30 31 
Effective intelligence 29 | 32 
Observation and report... -+++ | .26 27 
| 19 30 


Gecttity eon vss ue aiaei -| 


aaas S eae 
York: Rinehart & Company, Inc., 1948. 


These values are certainly not very high. They are also far below 
minimum acceptable standards for use in individual prediction. In 
their defense, however, it may be said that the values given fail to 
take into account the fact that the final assessment of each per- 
sonality variable was based on a number of situations. Thus “energy 
and initiative” was assessed upon the basis of seven tests- Therefore 
if the total test of “energy and initiative” is considered 7 times as 
long as each component part of the test, the reliability of the final 
assessment of “energy and initiative” may be considerably greater 
than that indicated. 

rn, however. 


This allowance cannot wholly mitigate our conce 


Column 2 of Table 175 shows the average intercorrelations of the 
Thus the assessment of “emotional 
31 with the assessments on 


* 
From Assessment of Men. New 


assessments for each variable. 
stability” correlates, on the average, 
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all other variables. This correlation, be it noted, is even a little 
higher than the intercorrelation of the various assessments of “ emo- 
tional stability” among themselves. In view of the fact that various 
assessments of “emotional stability” do not correlate any higher 
with each other than an over-all assessment of “emotional stability 
correlates with other personality variables, it is difficult to see how 
these other variables have, in fact, proved to be such. Are they not, 
too, just additional manifestations of the trait “emotional stability? 
In only two instances are the intercorrelations among the assess- 
ments for one variable higher than those among the assessments for 
all variables. These two variables are physical ability and propa- 
ganda skills. In the first instance the average intracorrelation for 
various assessments of physical ability is .52, and the average inter- 
correlation of physical ability with all other traits is only .10. In 
the second instance, propaganda skills, the difference is not nearly 


so marked. The intracorrelation is 47, and the intercorrelation is 


-30. In all other cases the intra and intercorrelations are practically 
identical. In view of this fact, v 


ve cannot refrain from concluding 
that the OSS Assessment Staff was unsuccessful in making assess- 


ments with a very satisfactory degree of reliability. 


We can examine reliability not only from the standpoint of the 


final assessment for each personality variable but also from the 
standpoint of the relative value of each method of assessment. That 
is, in assessing “‘social relations,” is the interview more or less reliable 
than the construction test, and so forth? Because of the low relia- 
bilities reported and because of the fact that these reliabilities do 
not differ greatly from the intercorrelations between the personality 
variables, it might seem that we cannot answer the questions which 
we have just posed. In a strict sense this is true, but we can at least 
determine the relative influence which each test or situation has 
upon the final rating for each variable. We can determine the extent 
to which the assessment made in each situation correlates with the 
final rating assigned on each variable, For example, Table 176 shows 
that “energy and initiative,” as assessed by interview, correlates .78 
with the final rating (based on interview, brook, construction, 
assigned leadership, obstacle, discussion, and debate). This correla- 
tion is spurious because the interview itself, along with other tests, 
is included as a basis for the final rating. The OSS Assessment Staff 
does not present the nonspurious correlation, however (that is, the 
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correlation between the interview assessment and the final rating 
with the interview excluded), so we must do the best we can with 
the data which it presents. 

_The OSS Assessment Staff reports that the interview correlates 
higher with the final ratings on all variables than does any other 
test or situation. Therefore the interview turns out to be the most 
important single factor in assessment. This can be gratifying, or it 
ing if the data are taken as demonstrating 


can be disturbing—gratify 
d technique of interviewing 


that the time-honored, much-maligne 


Tape 176. The Relation of Indizidual Energy and Initiative Ratings to Final Energy 
and Initia Ratings* 


Interview 


Brook... -+ 

Construction 56 
Assigned leade ae 
Obstacle è 41 
Discussion 55 
Debate... 54 


* From Assessment of Men. New York: Rinehart & Company, Inc., 1948. 


does have value after all, disturbing if the data are taken as demon- 
strating that no other test or situation tried is as good as the inter- 
view, which itself is not very reliable anyway- 

The authors of Assessment of Men state that the interview was one 
of their most valuable techniques of assessment. This can only mean 
that they relied on it because the other tests and situations failed 
to give the data sought. Practically all judgments and evaluations 
were modified, if modified at all, in the direction indicated by the 
interview. The fact that this occurred, however, cannot be taken as 
evidence that the interview 18 effective as a basis for making 


predictions. 2 ae 
Objectivity. From the standpoint of objectivity we have little 


comment to offer. Objectivity ordinarily implies that the result of 
any evaluation is independent of the observer who makes the evalua- 
nent program the only tests which meet this 


tion. In the OSS Assess? l 1 
requirement are those of the usual paper-and-pencil variety. All 
from the lack of objectivity in their scoring. 


the rest suffer seriously s k i $ conn 
The low intercorrelations discussed in connection with reliability 
may be due in part to lack of objectivity- The observer in one situa- 

ond situation. Therefore 


tion frequently was not the observer in a sec ; 
it is dificult to separate the relative influence of the lack of reliability 
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intrinsic to the situation from that due to lack of obicekiviry b- 
assessment was made by different observers. The OSS ee 
Staff properly objects to many paper-and-pencil tests, but t ey ae 
sess a much greater degree of objectivity than the situational tests 
which it used. A na 

Validity. An attempt was made to evaluate the effectiveness of the 
assessments or of their predictive value by relating them to overseas 
staff appraisals, theater-commander appraisals, epei 
appraisals, and returnee appraisals. The number of candidates for 
whom one or more of these types of appraisals were secured were 
pitifully small, and the validities secured were amazingly low. The 
coefficients which the OSS Assessment Staff reports are given in 
Table 177. 


Taste 177. 7 ‘alidity Data for the OSS Assessment of Over-all Success* 
1. Overseas appraisal... ve DT 
2. Returnee appraisa 


ny ceils 
3. Theater command, sie: (228 
4. Reassignment apprai „8 


* From Assessment of Men. New York: Rin 


art & Company, Inc., 1948. 


c traits were obtained only for the overseas 
are given in Table 178. 


Taste 178, Validity Data for Specific Trait Ratings in the OS 


1. Motivation 


Validities for specifi 
staff appraisals. These 


* 
S Assessment Program 


Rb BRP ATR Close ay asa Yri 
2. Effective intelligence wae aoe 
3. Emotional stability... 9g 
4. Social relations, . À -06 


5. Leadership 
*From Assessment of Men. New York: Rinchart 


atings of effective 
a standardized intelligence test 
might have been used, and the chances are that it might have given 
higher validity. 

The overseas staff apprais 
outstanding, avera 
other of these cate 


al consisted of a threefold classification: 
ge, and unsatisfactory, 
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was among them, how well each informant knew the assessee, and 
how dependable each informant seemed to be. 

We shall have to agree with the OSS Assessment Staff’s indictment 
of its own work. In its own words, “None of the statistical computa- 
tions demonstrates that the system of assessment was of great 
value.” Most of the procedures followed by the OSS Assessment 
Staff violate time-honored and proved techniques of rating. Oppor- 
tunity for independent judgments by independently working 
observers was all but made completely impossible. The procedures 


followed allowed the ratings to be influenced to too great an extent 
of the staff discussions and conferences. And the 


by the chairman 
general rating and full knowl- 


procedure required the submission of a 
edge of it before any of the more specific ratings could be made. 


Comments of the OSS Staff to the contrary, this last procedure is 
highly questionable. If ratings are to be made on specific traits, they 
should be made first. If generality is found in them thereafter, well 
and good, but it should not be put into the ratings ahead of time. 
This is just what the OSS Assessment procedure did. 


is 


PERFORMANCE: EXPERIMENTAL 


APPROACHES 


In the analysis of personality relatively little has been done with 


experimental techniques in contrast with what has been done with 


inventory, questionnaire, and related techniques. [t may be that 
the area encompassed in personality research is not amenable to 
treatment by experimental methods, or it may be that those in- 
terested in personality measurement have not been, primarily, 
experimental psychologists. There is considerable truth in both of 
these premises, but we ought to be sure that sufficient and adequate 
experimentation js undertaken in the field of personality measure- 


ment before we become tempted to conclude that its problems are 
not amenable to experimental attack. 


EYSENCK 
We shall first find it profit 


which Eysenck reports in his 
this volume, Dr. Eysenck rep 


able to review several of the studies 
volume Dimensions of Personality. In 
orts several attempts to determine by 
rences between clinically 
and between clinically diag- 
ng to Eysenck, there is no 


correlation between neuroticism and introversion, as some of the 


data reported in connection with the Bernreuter Personality Inven- 


tory would seem to indicate. Eysenck came to this conclusion as a 
result of some factor analyses. Whether 


correct or not need not deter us from exa 

studies. In each of these studies his subject 

basis of clinical judgment and not upon t 
396 


Eysenck’s contention 1s 
mining his experimental 
S were selected upon the 
he basis of his factorial 
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discuss Eysenck and his collaborators’ experimen- 
dark vision, exercise response, reversal 


of perspective, level of aspiration, personal tempo, perseveration, 
persistence, suggestibility, and narcosis. In each of the following 
sections, the hypotheses to be examined are that clinically diagnosed 
neurotics differ from clinically diagnosed nonneurotics and/or that 


clinically diagnosed introverts (dysthymics) differ from clinically 


diagnosed extroverts (hysterics). 
Autonomic Activity. Eysenck and Yap measured the amount of 
various conditions “such as reading, rest, 


mental work, food imagery, and whilst doing a test involving hand- 
eye coordination.” The method used to measure salivary secretion 
was that developed by Lashley and further standardized by Richter 
and Wada. In this method “a disc is held over the opening of Sten- 


son’s duct by suction and the saliva issuing from the parotic gland 


drained off through a rubber tube to 4 measuring device . . . en- 
abling the experimenter to measure secretion per unit of time IM 
cubic centimeters.” Eysenck and Yap tested 24 introverts (dys- 
thymics) and 52 extroverts (hysterics) and found that “in cach 
of... eleven experimental periods, the dysthymic [introvert] 
group [showed] less salivation than the hysteric [extrovert] group.” 
Introverts (dysthymics) secreted 41 percent more saliva than the 


extroverts (hysterics). i ; 

Dark Vision. The apparatus used to test night visual capacity 
was the Livingston rotating agon. Eysenck reports that this is: 
a can be rotated so as to present different panels 
altogether 96 letters and objects on its six sides. 
he objects are outlines of aircraft, 
test includes 30 minutes dark 
ht, followed by 10 
arefully explained. 


the dark by 


studies. We shall 
tation on autonomic activity, 


salivary secretion under 


hex 


ucture whicl 
are 
jous positions, and t! 
Preparation for the 
admitting only 3 per cent lig! 
hich the details of the test are c 
ation of the objects and letters in 


. a hexagonal str 
to the subject tested; there 
‘ie letters are placed in var 
ships, parallel lines, etc- - 
adaptation, with dark goggles, 
k room during W 
erpret 


minutes in the dar 
The subject [records] his int 
means of special Braille cards. 

ates that this H 
ision . + + 
are foot candles to 
level of illumination six letters and twi 
ute is allowed for the recording of answers. 


ic patients and compared their responses 


exagon test “ . . . measures both 
at various levels of illumination 
0012 square foot 
two objects 


Eysenck indic 
photopic and scotopic V 
foes. ranging from 00015 squ 
candles.” At each 
are exposed, and one min 


Eysenck tested 96 neurot 
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with those of over 6,000 RAF personnel. On a scale with a range 
from 0 to 32, the neurotics secured an average score of 7.1 and the 
RAF personnel an average score of 19.3. These groups are clearly 
differentiated in terms of mean scores and also, as shown in Fig. 13, 
in terms of the shapes of their respective distributions. 

Poor night vision clearly differentiates neurotic individuals from 
normal subjects and also, Eysenck found, distinguishes the more 
seriously ill neurotic patients from the less seriously ill neurotic 


äi x 
40 \ 
\ 

A 


30 \ 


-r 


Per cent 
- 


20} 


12 16 20 24 28 32 
Score 
Fic. 13. Dark vision test scores for neurotic 


] and normal subjects. (From Eysenck, 
H. J. Dimensions of Personality. 1 


-ondon: Routledge and Kegan Paul, Ltd., 1947.) 


ct he compared 50 patients with 
with 13 men with good night 
ber of items. The items found 


Exercise Response. Exercise affects oxygen uptake, pulse, and 
lactate consumption. According to Eysenck: 
Oxygen uptake was measured in cubic centimeters per minute . . , by means 
of a Douglas bag. Pulse rate was measured b 
first four minutes after cessation of exercise) 
utes) — 5(resting pulse)}. Lactate rise [was me: 


2 
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of a sample of venous blood removed before and another removed 10 minutes after 


standard exercise. 

| ates that the average intercorrelation of these responses 

is .56. Oxygen consumption and pulse rate correlate .63; oxygen 
S 5 

consumption and lactate output correlate .49; and pulse rate and 

lactate output correlate 56. All three indices reflect some one under- 


lying common factor. 


Eysenck st 


Tane 179. Comparison of Patients with Good and Poor Night Vision* 


iren Good Poor |Cri tical 
group | group | ratio 
Considerable unemployment. » -+ -+ ++7117 777 0.0 | 28.0 4.41 
Poor work history... -eetet ..| 0.0 | 10.0 2.36 
Discharged from the arm 15.4 46.0 2.44 
Poor education. . -~-t ss 7.7 30.0 2.29 
Good mental health before illness.. .-- 61.5 16.0 3.04 
Previous mental illness. -+++ 7177147 0.0 16.0 3.08 
Well organized personality... -- 76.9 30.0 3.40 
Very anxious and highly strung. .--- 0.0 22.0 3.76 
Obsessional AtS.. se se e nnie 0.0 14.0 2.85 
Cyclothymic personality... esete Cd 52.0 3.67 


onality. London: Routledge and Kegan Paul,Ltd., 


* From Eysenck, H. J. Dimensions of Pers 
1947. 

The task used to evoke exercise was that of pedaling on a bicycle 
utions per minute. The friction of the brake 
weight of 9 pounds, so in a five-minute work 
required to do 6,750 foot-pounds of work. 
hospital patients, equated for weight and 
for the three responses were normalized 
eight to each component) to give 


ergometer at 42 revol 
was set equivalent to & 
period each subject was 
Then 20 controls and 30 
age, were tested. The scores 


and then added (to give equal w 
one combined index of exercise response. On this combined index 10 


extroverts (hysterics) secured a mean score of 7.8, and 10 introverts 
(dysthymics) secured a mean score of 5.8. The 20 normal controls 
secured a mean score of 4.3. Both of the hospitalized groups differ 
significantly from the Js as well as from each other. 


Therefore we may con that neurotics show 
poorer exercise response than normals and that extroverts show 
poorer exercise response than introverts. í 

Reversal of Perspective In looking at ambiguous figures such 
as the Maltese cross and face-vase, do introverts differ from extro- 


normal contro 
clude, with Eysenck, 
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verts in the number of reversals which occur? Petrie gave these tests 
to 34 extroverts (hysterics) and to 38 introverts (dysthymics) in 
an effort to find out. Each of the tests was given twice: once when 
each subject was told to be passive and a second time when each 


subject was told to secure as many reversals as possible. No signif- 
icant differences were found. 


Level of Aspiration. This concept denotes an expected level of 
achievement. A person who decides that he is going “to make 
Phi Beta Kappa has a higher level of aspiration than the student who 
decides he is going to be satisfied with a gentleman’s C average. 
Thus, level of aspiration refers to expected achievement. 

The tests which Eysenck used to measure aspiration level were 
the Triple-Tester and a punch test. The Triple-Tester was con- 
structed by Dr. Craik of Cambridge and 


+ Consists of a brass drum carrying an Ivorine cover, rotating towards the sub- 
ject. This Ivorine cover is marked out as a helical “road” with holes punched in 
it. A “vehicle” in the form of a bronze ball moved sideways on a rack is steered 
along this road by a steering wheel. The purpose is to kee 
holes; each “hit” is scored on an electric counter. The 


rack through an integrating gear instead of directly. Instantaneous deflection of 
the vehicle from its path is impossible with this method of transmission, and the 
subject is forced to anticipate the necessary moves. The more he anticipates, the 
smoother will be the path which he describes whereas r 
the last moment will result in violent oscill 


requires correction and leads to still worse 


p the ball on the line of 
steering wheel operates the 


apid movements made at 
ations or wobbling of the vehicle which 
scores. 


The punch test is nothing more th 


an a Hollerith key punch used 
as a basis for a code-substitution tes 


t. This is accomplished by 
. . . putting before the subject a chart gi 
appearing on the keys of the punch, and 
whenever the punch is depressed. Thus the subject would look at the letter exposed, 


read off the corresponding number from the card, depress the key bearing the 
correct number, thus exposing the next letter, 


ving equivalent letters for the number 
exposing automatically a certain letter 


After preliminary explanation and trial and after having been 
told the maximum possible scores, the subject was asked to estimate 
what score he would get on each of these two tests. Then he per- 
formed the task, and was asked what score he thought he got. He 
was told what score he got and was asked to estimate what score 
he would get on the next trial. This sequence of estimate, per- 
formance, and judgment of performance was repeated 10 times on 
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each test, and leads us to a consideration of the following score 
variables. 

Goal-discrepancy Score. This is the difference between actual 

erformance on a given trial and the expected performance on the 
next trial. The difference is said to be positive when the expected 
level of performance is above actual performance and it is said to be 
negative when the expected level of performance is below actual 
performance. 

Attainment-discrepancy Score. This is the difference between the 
performance level attained and the expected performance level for 
this same trial. If performance turns out to be higher than the 
expected performance, the attainment-discrepancy score is positive, 
but if not, it is negative. 

Fudgment-discrepancy Score. This is the difference between actual 
performance on a trial and the subject’s judgment of what per- 
formance level was attained. If judgment is higher than performance, 
the difference is assigned a positive sign, but if performance is higher 
than judgment, the difference is assigned a negative sign. 

Affective-discrepancy Score. This is the difference obtained when 
the judgment-discrepancy score is subtracted from the goal-dis- 
crepancy score. Thus we start with the difference between the 
expected level of achievement and the last previous performance 
and subtract from this the difference between actual performance 
on the last trial and his judgment as to what his performance ac- 
tually was. This process presumably objectifies the original goal- 
discrepancy score by ridding it of the error caused by inability to 
estimate what past performance actually was. A person with a high 
affectivity score, even more than a person with a high goal-dis- 
crepancy score is (supposedly) unable to keep his level of aspiration 
in close contact with reality, that is, with actual performance. 

Index of Flexibility. This “is the simple sum of all shifts in the 
level of aspiration during the test.” It was computed without regard 
to the direction of the change. 

Index of Responsiveness. This consisted of a simple coun 
number of times that the level of aspiration was raised after success 
and was lowered after failure. A successful trial was, of course, one 
in which performance equaled or exceeded expected performance, 
and a failure was a trial in which performance fell below that 


t of the 


expected. 
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Fifty introverts (dysthymics) were equated to 50 extroverts 
(hysterics) on age, intelligence, and ability on the test. They were 
then compared with each other with respect to their average aspira- 
tion, performance, and judgment scores. When compared with actual 
performance, the aspiration scores of extroverts (hysterics) were 
found to be higher than those of introverts (dysthymics), and the 
judgment scores of extroverts (hysterics) were lower than those of 
introverts (dysthymics). The differences are statistically significant 
and are clearly demonstrated in Fig. 14. In comparing the responsive- 
ness scores, Eysenck also found that extroverts (hysterics) are more 
rigid and less modifiable through experience than introverts (dys- 
thymics). And, as might be expected, they show less flexibility. 

On the punch test Himmelweit tested 69 extroverts (hysterics) 
and 58 introverts (dysthymics). He found that extroverts (hysterics) 
have higher goal-discrepancy scores, higher affective discrepancy 
scores, and are more rigid than introverts (dysthymics). 

We may cite here a good example of what Hull might call the 
hypothetico-deductive approach in psychological research. Certain 
experimental results, those just discussed, led to the hypothesis that 
high affectivity scores are correlated with, or are symptomatic of, a 
poorly organized personality. If this hypothesis is correct, Eysenck 
reasoned that any condition which would “increase the affective 
relation of the patient to the-task” should cause the differences 
between introverts (dysthymics) and extroverts (hysterics) to be 
augmented over those already reported. Therefore he had Himmel- 
weit offer 22 extroverts (hysterics) and 20 introverts (dysthymics) 
50 cigarettes or 5 shillings if they would, upon a second testing @ 
week after the first test, beat their own previous score by 30 points 
on the Triple-Tester. Eysenck reports: 

The average scores on the actual test rose ver 
second testing for the extroverts (hysterics), but failed to show a significant rise 
for the introverts (dysthymics). The affective discrepancy was larger on the second 
testing for the extroverts (hysterics), smaller for the introverts (dysthymics)- 
Lastly, the judgment discrepancy on the second testing was considerably lower for 
the extroverts (hysterics), and considerably higher for the introverts (dysthymics)- 
In other words, the extroverts (hysterics) under conditions of special motivation 
tended to underrate their performance even more than usual, while under these 
conditions the introverts (dysthymics) overrated their performance even more. 


[Further] the rigidity of the hysterics did not change to any extent from first to 
second testing, the dysthymics became considerably more rigid. 


y significantly from the first to the 
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Fic. 14. Aspiration, performance, and judgment scores of dysthymic and 
hysteric patients. (From Eysenck, H. J. Dimensions of Personality. London: 


Routledge and Kegan Paul, Ltd., 1947.) 


An experiment was performed. It led to a certain hypothesis. A 
second experiment was performed to test the validity of the deduc- 
tions of the hypothesis, and, in this case, these deductions appeared 
to be verified. Thus, the hypothetico-deductive method means 
nothing more than the setting up of a hypothesis, the deducing of 
certain consequences which should follow from it, and the setting 
up of experiments to test the validity of such deductions. By infer- 
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ence from the experiment, a conclusion is reached as to whether or 
not the original hypothesis is valid. Strictly, according to the rules 
of logic, one cannot judge the correctness of an antecedent from the 
truth or falsity of its consequent, but if a hypothesis leads to too 
many false consequences, it certainly becomes suspect. 

Personal Tempo. Petrie tested a total of 75 introverts (dys- 
thymics) and extroverts (hysterics) on a number of tests of word 
fluency but found no significant differences among them. The nature 
of the eight tasks which Petrie set for her subjects was to give as 
many responses as possible when asked to write: 


1. Round things 

2. Birds 

3. Things which might be at a certain point on the picture of a tree 

4. Things which might be at a certain point on the picture of a street corner 
5. Number of concepts for a colored Rorschach ink blot 

6. Things to eat 

7. Flowers 


8. Number of concepts for a black and white Rorschach ink blot 


Perseveration. Eysenck lists five types of perseveration. These 
are sensory, associative, creative, motor, and Umstellbarkeit. The 
first is that evidenced by the rate at which a color wheel with black 
and white stripes must be rotated in order to make the flicker sensa- 
tion disappear. Associative perseveration is illustrated in the task 
of naming the color of ink in which the name of 
a background of still a different color. Creative effort perseveration 
is that tested by use of a device, such as the mirror-drawing test, for 
which an established habit must be broken before a new one can take 
its place. Motor perseveration is usually measured ‘by having a 
subject do two opposite things in alternation, such as writing ZZZZ 
for thirty seconds then ssss for thirty seconds, then zszszszs for 
sixty seconds, and so forth. Umstellbarkeit me i 
activity to another. We shall 


a color is printed on 


ans change from one 


c l not take time to discuss the results 
secured with these tests, since none yielded anything of significance 


between certain subject groups, as between introverts and extro- 
verts or between neurotics and normals. 

Persistence. A simple test of persistence consists in asking a 
subject to sit on one chair and to hold his leg extended over another 
chair with the heel of his shoe about an inch above the second chair. 


The subject is asked to keep his leg in this position as long as possi- 


mer 


— 
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ble, and then he is timed until the heel of his shoe touches the 
chair. Introverts (dysthymics) touched the chair in an average of 
fourteen seconds while extroverts (hysterics) touched the chair in 
an average of thirty-one seconds. Extroverts can persist in their 
muscular pose over twice as long as introverts. 

Suggestibility. Eysenck distinguishes between what he calls 
primary, secondary, and prestige suggestibility. Primary suggesti- 
bility he defines as some overt behavioral act resulting from a verbal 
suggestion; secondary suggestibility he defines as some sensory 
perception aroused as a result of suggestion; and prestige suggesti- 
bility he defines as a change in opinion or attitude due to knowledge 
of the opinions of other people (say of the majority or of prominent 
individuals). 

Three tests of primary suggestibility are the Chevreul pendulum 
Hull’s body-sway test, and Eysenck and Furneau’s press- 
release test. In the first of these tests, the Chevreul pendulum 
test, the subject is given a string with a small weight attached. He 
is instructed to hold this weighted string over a particular spot 


1 
front of him), but while he is attempting to 


marked on a table (in 
do this, the experimenter keeps telling him that the string will start 


oscillating along a line and will not stay over the point marked out. 
The extent to which swing is imparted to the string becomes the 
sure of the subject’s suggestibility. 

In the body-sway test the subject is asked to close his eyes and to 
stand relaxed and quite still. While the subject is attempting this, 
the experimenter keeps telling him that he is falling forward, and 
the amount of sway which occurs is taken as the measure of 
suggestibility- 

Finally, in the press-release test the subject is asked to lie on 
a couch and to hold a rubber bulb. In one part of the test he is told 
to hold the bulb just as he is holding it now, but suggestions are 
de to the effect that he is squeezing the bulb. Ina 
Id to squeeze the bulb as 


and then suggestions are repeatedly made to the 


tightly as he can, t S repeated 
effect that he is relaxing his grip- Suggestibility 1s measured by 


change in pressure. In the first part of the test an increase in pressure 
js taken as being indicative of suggestibility, while in the second part 
of the test diminution in pressure is taken as being indicative of 


suggestibility- 


test, 


mea: 


repeatedly ma hat h 
second part of the test the subject 1s to 
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Tests of secondary suggestibility are Binet’s progressive lines and 
progressive weights tests and Whipple’s picture tests. In Binet’s 
progressive lines and progressive weights tests a series of ap- 
proximately 15 lines or 15 weights is presented to the subject. The 
first five lines (or the first five weights) differ progressively in that 
each succeeding stimulus is longer or shorter (heavier or lighter) 
than the preceding one. The point is to set up in the subject an 
expectation of a continual increase (or decrease) in length or weight. 
Then when this expectation is presumably established, the subject 
is presented successively with, let us say, 10 additional stimuli. 
These are all objectively equal (in length or weight), so the measure 
of suggestibility is the number of these objectively equal stimuli 
that are said by the subject to be longer (or shorter) or heavier (or 
lighter) than the last of the changing stimuli. 

In Whipple’s picture test a picture is shown and then a number of 
memory-type questions are asked. Inserted among these questions, 

lowever, are several ringers, such as “What was the color of the 
tie which the man was wearing?” In fact, the man was not wearing a 


tie. The measure of suggestibility is the number of such questions 
to which an answer is given. 


Finally, a test of prestige suggestibilit 
a subject a chance to express his opini 


ular type of suggesti- 
this is, of course, to 


£ i ti- 
bility which is the object of study. induce the sugges 


Perimentation showing that a 
ed to neuroticism. This type is 


3 > measured by means of the body- 
sway test. In preparation for this test a sub 


: ES ject is asked to stand in 
a certain place, and a string is run fr 


al om his collar to a device that 
will indicate the amount of body sway that will occur. Before the 
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test itself begins, he is observed for a period of thirty seconds, and 
during this time the extent of his “natural” body sway is noted. 
Then following this thirty-second preliminary period, the subject 
is told that a record is to be played. The experimenter says: 


gil want you to listen carefully to what the record says, while you go on just 
standing there, quite still and relaxed, with your eyes closed. Listen carefully, and 
just keep on standing as you are standing now. I am putting the record on now. 
[The record continues:] 
“Now just keep standing there, please, quite still and relaxed, with your eyes 


closed, and think of nothing in particular. Just keep standing quite still and relaxed 
and listen to me. Now I want you to imagine that you are falling forward, you are 
falling, falling forward, falling forward all the time. Falling, falling forward, you 
are falling forward now. You are falling, falling forward, falling forward all the 


time... - 
[This continues for two and one-half minutes.] 


The maximum amount of sway is taken as a measure of suggesti- 
bility. For example, if the subject sways forward three inches and 
backward one inch, the suggestibility score is 3. If he sways back- 
ward six inches and forward four inches, his suggestibility score is 
6. If the subject falls or has to be caught by the experimenter to 
prevent his falling, the suggestibility score is arbitrarily set at 12. 


Tanie 180. Body Sway in Relation to Neuroticism™ 


Men Women 
Neurotic 
classification | Number | Mean | Number Mean 
Normal 60 | 1.02 60 1.11 
I 54 2.53 133 | 1.38 
II 132 3.10 ba] LA 
ll 247 3.92 90 |. 2-74 
IV 244 4.17 100 3.40 
vs 154 5.50 54 | 36l 
VI 69 5.55 19 | 6.72 


* From Eysenck, H. J. Dimensions of Personality. London: Routledge and Kegan Paul, 


Ltd., 1947. 
Table 180 shows the mean absol 


each of six groups of neurotic men 
neurotic women. These figures demonst 
relation between neuroticism and prim 


ute amounts of body sway for 
and for each of six groups of 
rate a definite and significant 


ary suggestibility. 
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Narcosis. Upon the basis of the body-sway test, two contrasting 
groups of subjects were selected. One called “the suggestibles”’ were 
those who swayed three or more inches, and the other called the 
““nonsuggestibles” were those who swayed less than two inches. Ten 
of the suggestibles and ten of the nonsuggestibles were given an 
intravenous injection of sodium amytal “till they could no longer 
count backwards without making gross mistakes.” They were then 
given the Press-Release test. All suggestible patients became more 
suggestible, but none of the nonsuggestible patients became sug- 
gestible. Thus sodium amytal can make a suggestible person more 
suggestible, but it apparently cannot turn a nonsuggestible person 
into a suggestible one. A control experiment on ten suggestible 
patients first tested in a normal state and then after an injection 
of a neutral saline solution showed that the increase in suggesti- 
bility cannot be accounted for in terms of the injection alone. It is 
due to the sodium amytal. 


A similar experiment involving the inhalation of nitrous oxide 


gave similar results. Not a single nonsuggestible patient, out of ten, 
became suggestible under the influence of nitrous oxide. But all 


but one, out of ten, of the suggestible patients became even more 
suggestible under its influence, 


à ; s 
We turn now to a consideration of three studies reported in the 


Terman Commemorative Volume Studies in Personality. These 
studies were conducted by Roger G. Barker, L. P, Herrington, and 
Robert R. Sears. We shall learn from these studies methods of 
defining a psychological variable and of determining some of its 
correlates. In Barker’s study, we shall find a definition of vicarious 
trial-and-error behavior, and we shall learn how this behavior is 
influenced by conflict—on the one hand, by a conflict between two 
desirable goals and on the other hand, by a conflict between two 
undesirable goals. In the second study, that conducted by L. P: 
Herrington, we shall discover a statistical way of defining a psy- 
chological trait of variability, and we shall find that this trait has 
predictive value with respect to the total amount of activity in 
which a subject engages. Finally, in contrast with Herrington’s 
study, we shall review the study on motility conducted by Sears. 


In this study, motility was the variable defined, subjects were led 
into situations of success or failure, and the effect of this success or 
feilure on the amount of motility was noted. The chief result which 
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Sears reports is an increase in variability. We have here two opposing 
but complementary approaches. Herrington starts out with the 
trait of variability and finds that it is predictive of activity, whereas 
Sears starts out with activity and finds that it is predictive of 


variability. 


BARKER 


The experimental situation which Barker designed made it possi- 
ble for him to present in pairs the names of two liquids. He asked 
each of his subjects to turn an upright (vertical) lever from the 
center of the table (at a position intermediate to the two cards on 
which the names of the liquids were exposed) toward the name of 
the liquid he would prefer to drink. In one trial the subject actually 
had to drink the preferred liquid (6 cubic centimeters), but in an 
alternate trial he merely had to indicate his preference and did not 
have to drink the liquid. 

Prior to the experiment proper, each subject had been asked to 
taste each of the seven liquids involved. ‘These were pineapple juice, 
orange juice, tomato juice, water, lemon juice, salt water, and 
vinegar. These liquids were ranked individually for each subject so 
that his order of preference was known. 

Nineteen boys, aged 9 to 11 years, were tested. Nine of them were 
given the “real” sequence first, followed by the “hypothetical” 
sequence, while the other ten were given the “hypothetical” se- 
quence first, followed by the “real” sequence. When the order of 
preference had been determined for a subject, he was presented with 
a list (serially and successively presented) of all possible pairs of 
stimuli. Since seven liquids were involved, this meant that there were 
21 possibilities. Two series were run, one in which the subject had to 
taste the liquid and the other in which he merely had to indicate his 

reference. A total of 42 choices was required. 

The lever which the subject had to operate to indicate his choice 
was arranged so that its movements were recorded on an auto- 
matically moving tape. The times of exposure of the cards were also 
recorded so that the time between the exposure of the contrasting 
stimuli and the occurrence of the response could be determined. To 
make the consummation of the response a definite affair, the lever 


was constructed so that it would have to be turned through an arc 
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of 5 inches, and would then cause a buzzer to ring. The ringing of the 
buzzer indicated that a choice had been made. Prior to this ringing, 
the lever-could be turned as much as the subject desired (short of 5 
inches) and in either direction. N 
unless the buzzer was sounded. Onc 
the trial was concluded, and the ch 

Two measures of the resolution of 
the total time, in seconds, before the 
of wavers which occurred before the 
defined as “a shifting of the lever foll 
position” without the buzzer’s hav 
arbitrarily classified into small, medium, and large and were given 
weights of 1, 2, and 3 in this order. A subject’s vicarious trial-and- 
error score consisted of “the sum of his wavers as thus weighted 
for the extent of displacement.” Barker computed the mean time 
required and the mean Vicarious trial-and-error score for conflicts 
between liquids Separated one step, two steps, three steps, and so 
on, up to six steps in the preference series. The data obtained show 
clearly that the greater the conflict the greater the time required 
for its resolution and that this additional time is required by the 
Increased amount of vicarious trial and error before the choice is 


© amount of turning counted 
e the buzzer sounded, however, 
oice was considered made. 

conflict were obtained, namely, 
buzzer sounded and the number 
choice was made. A waver was 
owed by a return to the original 
ing sounded. The wavers were 


The data we have just discussed t 


difficulty of the conflict only in terms of the number of steps between 
the two stimuli offered in comparison. We c 
Barker, whether the relati j 


ake into consideration the 


Juice, are considered apart 
for example, between 


HERRINGTON 


Barker’s study shows the way we can use experimental data in 
relation to personality theory. N 


study which shows us how to relat 
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level (a psychological concept) and various physiological measures 
such as “basal metabolic rate, total metabolism, pulse, respiration 
and blood pressure.” On both the psychological and physiological 
Herrington was concerned primarily with mean scores and 


levels, 
ability. Eleven male medical students, aged 19 


with individual vari 
to 24, were the subjects for this study. 


Taske 181. Means and Standard Deviations for Four Physiological Functions under 
Basal Conditions* 


Systolic blood 
Basal calories Banelthuse) pressure milli Respirati Pi 
; se rate, percent- pies : - piration F ulse per 
Sub- per hour age Dubois meters ol! per minute minute 
ject mercury 
Mean | SD | Mean | SD Mean | SD | Mean} SD | Mean} SD 
1 75.8 | 4.15 | 104.1 | 5.70 110.4 | 6.93 | 17.6 | 2.82] 70.1 5.53, 
2 72.5 | 3.95 | 100.2 | 5.45 113.4 | 5.06] 10.8] 1.50] 64.7 2,57 
3 LS | 3.28| 94-2 | 5.05 99.5 | 4.28 | 15.6| 0.93 | 68.8 4.30 
4 62.4 | 3.84] 89.4 5.50 | 103.6 | 6.78 7.2 | 133| 53.8) 431 
5 74.9| 3.56] 94.5 | 4.50 104.8 | 4.77 9.2} 1.68] 65.2] 3.43 
6 93.7) 4.45 | 974 5.80 | 108.9 | 3.57 | 10.1] 2.69 65.3 | 4.95 
7 69.0 | 3.59 | 102.6 5.35 | 103.2 | 4.34 14.5] 223| 61] 4.64 
8 61.0 | 4.49 | 96.3 7.10 | 96.8 | 6.07 | 10.0] 1.40] 58.2 2.80 
9 64.0 | 5.46 | 90.8 7.75 | 96:2 | 3.74 8.7 | 1.84] 52.8] 3.39 
10 70.2 | 3.16 | 96.6 4.35] 116.4] 5.31] 12.1] 1.03 53.3 | 4.44 
11 61.9 | 3.35] 90.5 4.90 | 102.8 | 3.83 | 10.1 91} 50.7| 2.94 


and Merrill, M. A. (Eds.) Studies in Personality. New York: McGraw- 


* From McNemar, Q. 
Hill Book Company, Inc., 1942. 


Table 181 shows the means and the standard deviations of the 
distributions for 45 daily observations (secured over a period of 
ninety days) of the physiological functions for the 11 subjects of 
the experiment. These data are of particular interest because of 
Herrington’s claim that they are the most accurate determinations 
of these functions ever reported. 

Herrington claims that there are at least two contributory causes . 
of the variability in these physiological measures. The first is due to 
o uncontrolled factors, and our only interest in these is to 
far as possible. Over and above the portion of the 
error and to uncontrolled factors, however, 
s a proportion that is “basically 
the delicacy with which a given 


error and t 
eliminate them as 
variability due to 
Herrington believes that there i 
biological and is a reflection of 


412 Personality Measurement 


function is regulated.” This being the case, it becomes important, 
says Herrington, to determine the relationship between the mean 
intensity of a physiological function and its variability, this latter 
being “heavily weighted with intrao 
significance.” The correlations which Herrington reports are given 
in Table 182. The standard errors are large, so Herrington is forced 


to conclude that there has been demonstrated no significant relation 


rganic factors of homeostatic 


TABLE 182. Intercorrelations between the Means and Standard Deviations for Each 
of Four Physiological Functions* 


a | | 
Function r 


Basal metabolic TALC cscs deen]! oO) 30 
Systolic blood pressure. 


13 30 
Respiration rate... wat 80) 27 
Pole ateiciai oy ova oa, 42 25 


* From Me) Vemar, Q., and Merrill, M. A. ( 
Hill Book Company, Inc., 1942, 


Eds.) Studies in Personality. New York; McGraw- 
and intensity of these four physiological 
er. He concludes that variability is suff- 
mean intensity to constitute, itself, some 
measure of a “general factor of intraorganic control,” This being the 
case, he considers each of the four Standard deviations as but an 
estimate of this general factor and combines them, giving equal 
weight to each, into a composite index of this general factor. Now 
our question is “ Does this index have utility as a predictor of general 
activity?” 

To get the answer to 


A this question, Herrington had each of the 11 
subjects rated on activ 


ity by three raters. One rater was a junior 


member who had 
a third rater was a 


student in the medical class Each of these raters was 


concerned, 
provided with the following 


instructions: 


In the course of the next 90 days please observe the following 11 men for the 
purpose of rating them individually on general activity and drive. We cannot define 
this trait precisely but you = asked to Consider (1) physical Vigor as suggested 
by athletic pursuits, speed of movement, and typical Postures at work or when 


idle, (2) excitable speech and pressure for expression in group situations, (3) energy 
and enthusiasm in meeting class work requireme: f 


man will be compared with every other member of th 


H 
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as more or Jess active in terms of the general impression you gain from considering 
at frequent intervals the criteria listed above. Since the period of observation will 
be relatively long, it is probable that you will often see all or several of these men 
together, in class, gymnasium, laboratory, lunch room, or at social gatherings 
We believe the final judgment will be improved if you make trial ratings in these 
face-to-face situations without preserving the results. 

The composite activity ratings resulting from these instructions 
were found to correlate with the physiological measure of control 
to the extent of .51. This indicates, according to Herrington, “that 
the rated activity and objective measures of physiological varia- 


bility are related to some degree.” The activity ratings are more 
to the mean intensity levels, however, as the data in 


highly related 
Table 183 demonstrate. 
Tase 183. The Relation of Activity to 4 Physiological Functions* 


ratalostcs Systolic 
ae en Activity | Pulse |Respiration| blood 
| pressure 
Variability: | 
Pulse. 33 | 
Respiration. . 39 57 | 
Systolic blood pressure... 56 23 05 
Basal metabolic rate..-.---+ ++ —.08 —.19 26 | x 
Intensity level: | | 
Pulse......-+088 097 sawi 81 | 
Respiration. . .- 4 Ad 66 
Systolic blood pressure. «+++ | 24 23 | 20 
45 B 75 49 


Basal metabolic rate... -+-+ 


d Merrill, M. A. (Eds.) Studies in Personality. New York: McGraw- 


* From McNemar, Q., ani 
1942. 


Hill Book Company, Inc., 


In relation to activity, from the standpoint of physiological 


variability, systolic blood pressure appears to be most predictive, 
and from the standpoint of mean intensity level, pulse rate appears 
to be most predictive. The first, that is, systolic. blood pressure, 
correlates with activity level to the extent of .56; while the second, 
that is, pulse rate, correlates with activity to the extent of 81. The 
two physiological measures, systolic blood pressure (variability) 
and pulse rate (intensity), correlate with each other in the neighbor- 
hood of .10, so when combined, they produce a multiple correlation 
with rated activity of .91. If this correlation 1s accepted as valid, it 


indicates, according to Herrington 
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. . . that the popular impression of activity in the individual is very closely related 
to the balance of autonomic influences acting through the vagus and cardiac acceler- 
ator nerves upon resting cardiac rates and to the lability of blood pressure control, 
which, in turn, not only is affected by the above mentioned factors, but also reflects 
the general action of the vasomotor centers of the circulation. 


It seems clear from Herrington’s data that there is a fundamental 
relation between activity and certain physiological functions. We 


can best summarize the significance of Herrington’s study by quot- 
ing his own conclusions. 


Pressure of activity, in a broad psychosomatic sense, is an important aspect of 
personality. In groups in which variations in intelligence and educational back- 
ground are controlled, it is possible to evaluate the trait by rating methods in a 
manner that yields considerable objective evidence of validity, A partial physio- 
logical background for the trait is strongly suggested by the association between it 
and several characteristics of circulatory regulation. There is no reason to believe 
that this association is a direct one. It appears more probable that the physiological 


signs appearing as correlates are superficial indicators of much more general proper- 
ties of the autonomic nervous system. 


SEARS 


We turn now to a consideration of an experiment conducted by 
Robert R. Sears. This experiment had as its chief purpose the 
determination of the effect of Success and failure on motility or 
activity. This motility or activity is measured in a different way, 
however, than it was in Herrington’s study. 

As soon as a subject arrived 
Dr. Sears or by an assistant and 
room. If he was met by Dr. Sears 
ready. If he was met by Dr. 
subject that Dr. Sears was not quite ready or 


» Dr. Sears said he was not quite 


was a little late but 
hen left alone for six 
€ period was carefully 


Way vision screen and wearing 
a pair of earphones that gave a click every five seconds, made 70 


observations. These observations cons; 


anding or sitting 
r the environment, 
e environment per- 
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ceptually but without actual manipulation of anything, and overt 
movement such as walking, manipulation of the environment, 
object-directed behavior. ` 

The reliability of the observations, as measured by agreement 
between two observers, was quite high. The experiment proper was 
not conducted until the observers had had sufficient practice, and 
were found to disagree with each other on no more than 3 of the 70 
judgments in the various observation periods. 

Upon the basis of the activity or motility ratings secured in the 
preliminary observation period, the 24 subjects were divided into 
two equated groups of 12 subjects each. The members of one of these 
groups were then put through a card-sorting test in which they 
experienced success, and the members of the other group were put 
through the same card-sorting test but conditions were so arranged 
that they experienced failure. Subsequent to these experiences, 
that is, success or failure, the groups were again observed on motility. 

Immediately following the preobservational period the subjects 
were asked to participate in a card-sorting experiment. After the first 
or second trial the attention of the subject was drawn to a large 
chart on the wall, purporting to give the results for some Ohio high- 
school students. The chart was purely fictitious but was designed so 
he experimental group paying close attention to it 
he scores on it with his own true scores could not 
help but consider himself a failure. To create a feeling of success 
the second group of subjects were given fictitious time scores. These 
scores showed the subject that he was superior to about 85 per cent 


of the Ohio high-school students. : l 
Following the card sorting a six-minute observation period was 


allowed, and then a second experiment, irrelevant to this report, 
was performed. On the second day the subject was again asked to 
sort cards and was again made to experience success or failure. A 

d followed. Following this a task in 


ix-mi ion perio 
six-minute observatio! 1 
which the subject was made to feel successful was given, and then 


ix-mi bservation period. 
h as a final six-minute O 
t To madet it seem plausible to the subject that there should be so 


many periods (four altogether) during which the experimenter could 
not et the room various excuses were offered. For the first period 
3. 


it was that Dr: Sears was 2 little late or that he was not quite ready. 
For ihe N period he had to leave the room to get some material 


that anyone in t 
and comparing t 
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for the next part of the experiment. For the third period Dr. Sears 
received a long-distance telephone call, and for the last period he 
had to go to the cashier to get money to pay the subject. 

The data show a clear-cut change in autism as a result of the 
experience of success or failure. In the preliminary observation 
period, the success group exhibited this kind of behavior with 
greater frequency than did the failure group, but on the two experi- 
mental days, the relation was clearly reversed, and there is no 
question of the significance of the difference. 

In choice of activity there was also found to be a significant differ- 
ence between the success and failure groups. In the preliminary 
period, the success group paid more frequent attention to the cards, 
but after success or failure had been experienced, it was the failure 
group that paid more frequent attention to the cards. Sears points 
out that this “card oriented activity was a variety of persistent 
non-adjustive behavior that almost guaranteed a perpetuation of the 
feelings of failure.” 

The success and failure groups also differed from each other in the 
amount of time they spent looking at the Ohio chart and in the 
number of times they forgot to give an estimate for their scores on 
the next succeeding trial in the card-sorting experiment. The success 
group looked at the chart more frequently, and the failure group 
forgot to give their estimates more frequently. Sears points out that 
both these phenomena are in line with Thorndike’s law of effect. 
Success has stamped in an activity (that of looking at the chart) and 


failure has tended to stamp out an activity (that of giving an esti- 
mate as to the next succeeding score). Sears concludes: 


These data reveal three characteristics of the reactions to failure that are deserv- 
ing of further consideration. First 


> : ; rr » although there was no evidence of a decrease in 
object-manipulative activity, 


: the general motility level was less for the failure than 
for the success subjects. The frequency of day-dreaming and autistic thinking was 
sharply increased, and the social responsiveness was reduced. These changes in- 
evitably serve to modify the effectiveness of a person’s relatio; 
He is less sensitive to changes, less likely 
adjustive or so modifiable as he would be 
reduces the possibility of his having new 
behaving. He avoids his environment. 


n to his environment. 
to perceive new instigators. He is not so 
if the failure had not occurred. All this 
experiences or of initiating new ways of 


Second, in direct relation to this, failure leads to a dogged but ineffectual con- 
tinuation of the task at which failure occurred. What interaction with the environ- 


ment there is is in the direction of the old activity. But the old activity is half 
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s unfinished in order to avoid the danger of failing; this 
effectively precludes success, and therefore the person fails anyway. This persistent 
nonadjusting behavior . . . necessarily prevents the development of adjustive 
responses. There is no seeking for new tasks or new methods to circumvent the 
failure. Worse, this failure-induced behavior alienates the environment. Non- 
responsive persons are neither pleasant companions nor cooperative instruments in 
activities that require mutual assistance. . . - 

Finally, the process of decontextualization that failure subjects exhibits serves 
in still another way to reduce their adjust-effectiveness. This process splits off the 
activity from its social frame of reference, reduces its contact with reality, and hence 


decreases the opportunities for the person to check up on the task’s importance by 
decontextualization as a response to failure might 


reference to reality. In a sense, 
be said to reduce the influence of the reality principle, to make reality testing more 


difficult. 


We have now de 


avoided, the card sorting i 


monstrated through our accounts of the experi- 
ments conducted by Eysenck, Barker, Herrington, and Sears that 
the measurement of personality variables can be attacked through 
experimental means. The treatments involved are much more time- 
consuming and expensive than most of the other techniques we have 
discussed. But they are, of course, much more subject to control. 
In our present state of knowledge, we are forced to pay a high price 
either way we turn. If we want quick and easy techniques of per- 
sonality measurement, We find that we pay heavily in our loss of 
control over the situation in which the subject’s responses are to be 
made. But when we set UP situations so as not to lose this control, 
we find our techniques so cumbersome, expensive, and time-con- 
suming that we can apply them, at best, only to a few individuals. 
It is to be hoped that whatever direction personality testing may 
eventually take, the techniques will possess both maneuverability 
and control. In fact, personality testing must proceed in this direc- 
tion if we are ever to realize the values which we assume such 


measurements to possess- 


16 


EVALUATION AND SUMMARY 


We have by no means exhausted the field of personality measure- 
ment. But the methods we have treated are important and should be 
thoroughly understood. Once they are understood, the basic prin- 


ciples can be applied to the understanding of any personality test 
now in existence. 


into the methodology of 
uction—an insight that he cannot get from a 
treatment has enabled 


a t superficialities, for the 
ne multitudinous supply of per- 


have been employed in personality-test construction. 
In our treatment of the subject we have, perhaps, been overneat 
418 
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in the suggestion of a basic dichotomy within each of the areas we 
have chosen to discuss. Our treatment seems to indicate that there 
are just two opposing methods of measuring attitudes, that there are 
Just two opposing methods of measuring interests, that there are 
just two opposing methods of measuring adjustment, and so on. 
This suggests a pattern which does not in fact exist. There are 
multiple methods of measurement in each of the areas with which 
we concerned ourselves. But we selected, in each instance, two 
methods to point up some important contrast, to highlight different 
emphases, to illustrate diverging principles, and so forth. It was our 
feeling that such contrasts would add interest to our story and 
would help the student fix more clearly in his mind the fundamental 


methodologies involved. 


METHODS OF MEASUREMENT 


Let us attempt a review of these methodologies and see if we can 
arrive at some evaluation of their relative degrees of effectiveness in 
contributing to our better description, control, and prediction of 
human behavior. First, let us make a list of the methodologies we 


have discussed. They are as follows: 
A rational approach 


An empirical approach ppi 
An a priori approach An a posteriori approach 

A unidimensional approach A multidimensional approach 
A diagnostic approach A prognostic approach 

4 An analytical approach 


A nonanalytical approach ‘ 
A perceptual approach An imaginal approach 
An experimental approach 


An observational approach 
t we have studied 14 methods of personality 


ms like a sizable number—and it is—but let 


this fact not blind us to the equally important fact that there are 
other methods of measurement (interviewing, for example) that 
we did not treat in this volume. Now let us fix clearly in our minds 
_the basic idea in each of the 14 methods of personality measurement 


we have studied. a 
Empirical Approach. An empiric 
use of the data derived from experience. 
from experience ar 


or dissimilarities derived | 
the derivation of implications relative to person 


“This list shows tha 
measurement. This see 


al approach is one which makes 
In some way the uniformities 
e harnessed for use in 
ality structure. The: 


420 Personality Measurement 


positive advantage of the method is that the implications are ait 
bedded in a solid structure of fact. Therefore, the method works. 
This does not mean that our predictions can be 100 per cent accurate 
or that our understanding is complete. The example we chose to 
illustrate this method was the Strong Vocational Interest Test. This 
does not mean that other methods of personality measurement are 
not also empirical. But we can safely say that the Strong Vocational 
Interest Test is the outstanding example of such an 
discussion of the development of the Strong 
Test makes it apparent that the empirical appro 
ous, and tedious one. But the practic 
students have derived from it shows th 
dividends. 

Rational Approach 
theory and proceeds 
reference to experienc 
to be able to follow ou 
ing to see if this parti 
results so far as the 


approach. Our 
Vocational Interest 
ach is a long, labori- 
al value which thousands of 
at the method pays handsome 


- A rational approach is one which starts with 

toward a predetermined objective without 
€ or to data derived from it. The purpose is 
t systematically some particular line of reason- 


cular line of reasoning will lead to productive 
measurement of person 


hod can come only by 
data, but during the course of test construction, theory serves as 
the basic and, perhaps, only guide. Our example of this method was 
found in the Kuder Preference Records, Kuder’s objective was to 
develop several scales of measurement that would operate inde- 
pendently of each other. This on the theory that a relatively small 
number of such scales would suffice to tap all the major subareas 
of the general area in which he, Kuder, was interested. The purpose 
of pursuing such a rational objective lies in the supposition that the 


r a much more economical 
ot be more economical in 
al in the light of the presumed 
ation, and a greater amount of it, would 


ality is concerned. A 
reference to empirical 


fact that more useful inform 
be secured. 


We saw in Chap. 3 that the attainment of a rational objective, 
even though explicitly set forth, is not necessarily an easy matter. 
Kuder did not find it easy to develop his independent scales. But 
these difficulties, while wearying, exasperating, and perhaps heart- 
breaking, are not fundamentally important. They are mere ime 
pedimenta along the road we choose to travel, Of great importance 
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in the rational approach, however, is the fact that the attainment of 
our original objective tells us very little about the value of the 
method as an aid to us in describing, understanding, predicting, and 
controlling human behavior. We have this to learn after es 
accomplished our rational objective. Thus Kuder had to nen 

ational data subsequent to the development of his Preference 


occup 
Records to show that the information they provide is useful in 


vocational guidance. In the empirical approach used by Strong this 
final step was not necessary because the securing of such data was 
propaedeutic to the development of the scales. Ergo, it was available 
when they were complete. Kuder would argue, as we pointed out in 
Chap. 3, that once the occupational data for his scales are as com- 
plete as those for the Strong Vocational Interest Test, his Preference 
Records will provide a more effective coverage of the areas involved. 
We cannot properly pass judgment on this contention at present, 
however, because the occupational data provided by Kuder are far 
less complete and clear-cut than those provided by Strong for his 


Vocational Interest Test. 


A Priori Approach. An a priori approach is one in which the 


measuring scale is prepared in advance and is completely ready 
before it is given to the group or groups which the investigator is 
primarily interested in studying. Their responses have played no 
part in the basic construction of the scale. This allows the measuring 
technique to transcend the particular group under study and gives 
it a type of universal applicability. For this reason, it possesses a 
certain degree of utility not found in certain contrasting techniques. 
It possesses inherent disadvantages, however, as well as advantages. 
One of the major disadvantages is that the scale-construction 
process, being divorced from the groups to be studied, may not be 
found appropriate- The continuum involved may be too long or too 
short or misplaced. The words in such a scale may be too difficult 

forth. Another disadvantage is the requirement 


or too easy, and so | sadvan 
It ahead of the time it 1s to be used, and if the 


that the scale be bu ad of i e 
deration is a transitory affair, such as our 


variable under cons! 
attitude toward the North Koreans on such and such a date, we 


ave our scale ready in time to use it. 

however, the advantages of this technique have more 
d the disadvantages, So the method has had wide- 
he advantages that give the method its popu- 


may not h 

In general, 
than outweighe 
spread use. Among t 
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larity and utility are the standardized stimulus situations that can 
be set up, the uniform scoring standards, and the objective nature 
of the comparisons it makes possible. , f 
A Posteriori Approach. An a posteriori approach is one in which 
the measuring scale is prepared after the responses to the group to 
be studied have been secured. It is in fact directly dependent upon 
these responses. This gives the method the unique advantage of 
being adaptable to the group being measured. It cannot help but be 
appropriate since it is designed upon the basis of the data that the 
group itself supplied. But this same advantage leads to the method’s 
principal disadvantage: that the scale must be developed anew for 
each group measured. And when this is necessary, we have no assur- 


ance that we can maintain a proper degree of comparability among, 
or the same standards for, these different groups. 


This technique is advantageous, however, when we do not have 
time to follow through on a thoroug! 
hark back to our previous example 
Koreans on such and such 
the only one that can be r 

Unidimensional Ap 
that provides an inde 
linear variable. 


hgoing a priori approach. To 
of attitude toward the North 
a date, the a posteriori approach is about 
eadied in time to meet the situation. 
Proach. A unidimensional approach is one 
x that can vary back and forth on just one 
All it shows, therefore, is more or less of whatever 
variable is being measured. The primary advantage of the method 
is the clear understanding on our part of just what it is we are meas- 
uring. This advantage accrues, however, only when we have suc- 
ceeded in confining our measurement to the intended linear variable. 
Frequently we fool ourselves into thinking we have this when in 
reality we have not. But on the assumption that we can achieve or 
approximate the ideal under consideration, we can predict more 
precisely any of the concomitant behavior variables tapped by, or 
related to, our one linear measure. We can determine its precise 
importance in contrast to that of other linear measures related to 
the same concomitant variables. 

The examples we gave in Chap. 6 are not i 
demonstration of a unidimensional approach. The tests discussed 
were designed by their authors, however, with the unidimensional 
approach in mind, so the methodologies of test construction which 

they followed show some of the ways in which psychologists have 
tried to develop unidimensional personality-measuring instruments. 


deal examples for a 
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dvantages of the unidimensional approach 
l tion it provides. All we get is one score, one 
rating, or one grade. This one datum may be of the utmost impor- 
tance, but we have no difficulty in thinking of other data which, if we 
would be of equal and, perhaps, of even greater impor- 
tance. Therefore, even if we had a perfect unidimensional index of 
every important personality trait, it is conceivable that it would 
take us so long to get the information we needed for each subject 
that our measuring techniques would not serve any particularly 
practical purpose. 

We shall undoubtedly continue to attempt the derivation of uni- 
dimensional methods of personality measurement for some time to 
come, however, for there still is no agreed-upon list of personality 
traits important for the basic understanding of human behavior. 
This means that the field is wide-open for an attempt on the meas- 
y of the innumerable aspects of personality. Only 
attempts have been made shall we be able to 
me basic structure which all psychologists 
ation for the understanding of human 


One of the major disa 
is the paucity of informa 


had them, 


urement of an 
when many more 
begin a synthesis to give so 
can rely upon as aà found 


personality. 
Multidimensional Approach. A multidimensional approach to 
can imply any one of several things. 


the measurement of personality 

It can mean the simultaneous use of two or more of the unidimen- 
sional approaches to the measurement of personality. We can 
measure, let us say, several different attitudes by means of the 
simultaneous application of several of the Thurstone attitude scales. 
Or we can measure several different personality traits by the simul- 
f several of the tests we discussed in our chapter 


taneous application o 
on unidimensional approaches to the measurement of personality. 
roach can indicate an approach such as 


imensional app 
ified in the Guilford-Martin Inventories, namely, the 
rement of independent traits. This variation 
res the utilization of many more items than 
t of any one of the traits, but the 
ld not have been 


ion to and in 


Or a multid 
that exempl 
simultaneous measu 


of the approach requi 


are needed for the measuremen 
provision for ing the individual traits cou 


made unless 4 e considered in relat 
contrast with eac words, independence of two or 
. . 

more traits cannot aits involved are not 


ured simultaneously- 


measur 
Il the items wer 
h other. In other 
be established if all the tr 


meas 


424 Personality Measurement 


Lastly, a multidimensional approach may denote the use of one 
set of items scored in different ways to give rise to measures on 
several different personality traits. The outstanding example of this 
approach is, of course, the Bernreuter Personality Inventory. This 
variation assumes, as we pointed out in Chap. 6, th 
have significance for more th 
can be granted, or we c 
the seemingly dispar 


at one item can 
an one personality trait. This position 
an argue that if such is apparently the case, 
ate personality traits may be one and the same 
variable parading under two different names. In the light of our 
present knowledge, there seems to be no reason to deny the assump- 
tion that an item can have differential degrees of significance for 
substantively different personality traits. 

One of the major advantages of the multidimensional approach is 
its economy. It gives us data on several personality variables in 
just about the same amount of time as we can get data on just one 
unidimensional personality trait. And we presume, of course, that 
the greater the number of measured variables, the greater can be 
our understanding, prediction, and control of human behavior. This 
economy will vary with each particular variant of the multidimen- 
sional approach. if the multidimensional variables are independent 
of each other, we can cover more adequately a given segment of 
personality structure than if the variables show substantial inter- 
correlations. There is nothing sacred about uncorrelated variables, 
however, and there is no reason for us to suppose that human be- 
havior or personality is such that it can best be described in terms of 
uncorrelated variables. Most of our measures of personality provide 
such gross measurements, however, that it is economical for us to 
utilize uncorrelated variables as a first approximation to the true 
nature of the personality dimensions in question. When our knowl- 
edge becomes more precise, we may find it profitable to abandon 


our present-day emphasis upon the economic utility of uncorrelated 
measures of personality. 


When the multidimensional 
fied in the Bernreuter Personality Inventory, we run into 
disadvantage to which too little attention has 
relation between the scoring weights and th 
scale. Many investigators have gone to muc 
interscale correlations by administering test 
subjects and correlating the total scores. Th 


approach in question is that exempli- 


a technical 
been given. This is the 


Evaluation and Summary 425 


unaware of the fact that these total scores are a function of the 
item weights and that the intercorrelation between total scores is 
predetermined by them. To illustrate this fact in a concrete manner, 
suppose we had gone through two separate item-analysis procedures 
and had arrived at two sets of item weights. But let us suppose that 
the weights in one of these sets completely duplicated those in the 
other set. Their intercorrelation would be 1.00. Now, would we 
need to give this test to new groups of subjects and get their total 
scores on two variables to find that the intercorrelation between 
them is 1.00? Obviously not! But suppose the item-weight inter- 
correlations were .90, .80, or .50. Do not these values also determine 
pretty largely the intercorrelations we shall secure between total 
scores? The author does not pretend to know whether the inter- 
correlations between the total score intercorrelation and the item- 
weight intercorrelation is a strictly linear one, but it does not seem 
unreasonable for us to assume that the relationship is pretty direct. 
This being the case, once we have established our item weights for 
our different scales, we have preestablished the intercorrelations 
between the total scores on these same variables. The scores on such 
a test cannot be used to establish anew the fundamental interrela- 
tionships involved. This fact, ignored by many investigators, calls 
into question much of the published interscale correlational data 
for tests such as the Bernreuter Personality Inventory and the 
Strong Vocational Interest Test. In fairness to Dr. Strong, we must 
point out that he has been aware of this problem and that he has 
presented his views on the subject in his book Vocational Interests 
of Men and Women. The reader will find it worth while to familiarize 
himself with Strong’s treatment of this subject. 

One of the difficulties in the variant of the multidimensional 
approach exemplified by the Guilford-Martin Inventories is that of 
maintaining independence between the supposedly uncorrelated 
variables. In a standardization group, an investigator can keep at 
the problem until all correlations are at or are near zero. However, 
when he applies his test to a new group of cases, he is very apt to 
find a substantial increase in the interscale correlations. This has 
caused some surprise, particularly when the items basic to the dif- 
ferent variables were not in any way common to each other. These 
investigators have failed to realize, however, that even though the 
items were not the same, the subjects were. Therefore, whatever 
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rs the subject made in answer to one set of items are idonbredly 
ee me kind he made in response to the other set of items. ‘I his 
pisa of errors, even when nothing else is involved, is suffi- 
cient to introduce substantial interscale correlations. The Juggling 
that an experimenter does on his standardization group merely 
attunes his instrument to the sampling errors of that group. As sagn 
as he applies his test to a new group, he loses this basic attunement, 
and with this he loses his zero intercorrelations. 

With all its faults, hazards, and difficulties we can conclude that 
the multidimensional approach to the measurement of personality 
has much to offer. It gives us a deeper insight, we believe, into the 
basic structure of personality. And it gives us multiple landmarks 
by which to plot the course of human behavior. may our 
plotting is not very accurate, but it is something better t ens ne p 
and would appear to hold out for us somewhat more hope than the 
continued exclusive use of the unidimensional approach. 

Diagnostic Approach. A diagnostic approach to the measurement 
of personality emphasizes the present. We want to know somethin 
about our subject now, and we probably want to do something 
about it when we get our information. A student is failing in his 
studies, so we give him, among other things, an adjustment inven- 
tory to see if his trouble lies in his emotional life. 1f we find that his 
study habits are above reproach and that his home adjustment is 
poor, we infer that lack of home adjustment may be the seat of the 
student’s difficulty. And if our further investigation confirms this 
belief, we bring this to the attention of the student or, in other ways, 
try to do something about improving his home adjustment. We can 
see, therefore, that the utility of a diagnostic approach to the meas- 
urement of personality lies in the possibility that we can make 
immediate practical application of the results. Our predict on is 
that something is now in need of correction or is not now in 
need of correction, and our further action or lack of action follows 
accordingly. 

No matter what our a 
we want the maximum 
and validity. But we p 
reliability, and greater 


Pproach to the measurement of personality, 


possible degree of objectivity, reliability, 


jectivity, greater 
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tion to the individual. And for such individual application, we need 
more objective, more reliable, and more valid measures than when 
we wish to apply our implications to group behavior. The reader 
may object at this point and claim that we want all of our per- 
sonality measures to be applicable to individuals as well as to a 
group. This is true, but the fact remains that all approaches except 
the diagnostic do have value with reference to groups as well as with 
reference to individuals. The diagnostic approach does not have this 
twofold applicability, at least not to as great an extent as the other 
techniques of personality measurement. 

Prognostic Approach. A prognostic approach to the measurement 
of personality emphasizes some future outcome. We get our measure 
now, but our interest is in the prediction of some future event. 
Sometimes this future is only a little way off, but at other times it 
is a long way off. In eithe case, we are trying to get advance in- 
formation. Sometimes we might wish to apply the knowledge gained 
so that the anticipated future course of action or the series of events 
can be changed, but most frequently we are merely interested in 
observing what happens in the future so we can assay the relative 
contributions of psychological and other factors to the outcomes 
which we observe. We chose to illustrate the prognostic approach 
by discussing marital-happiness prediction scales and the prediction 
of success in selling life insurance. These examples illustrate the 
importance, nay, the necessity, of a clear-cut definition and under- 
standing of that which is to be predicted and of our deciding ahead 
of time how the various outcomes are to be measured. Once these 
problems are settled, we can proceed to the measurement of our 
predictors and from these derive the basis for the predictions which 
we desire to make. 

Nonanalytical Approach. A nonanalytical approach to the meas- 
urement of personality is one which provides an index on any defined 
variable but which does not contain or provide any of the supporting 
reasons, or the particular bases, from which the index value was 
derived. We pointed out in Chap. 10 that such an index is frequently 
all we need and so can serve as a useful datum in personality meas- 
urement. But the occasions when such an‘approach is permissible 
are gradually being reduced. Therefore, over the years to come, we 
shall use nonanalytical methods of measurement less and less 
frequently. 
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Analytical Approach. An analytical approach to the measure- 
ment of personality is one which provides supporting cece Z 
makes evident the bases for the indices derived from it. Any metho 
of measurement which requires that the indices be based upon 
several or upon many specific items of behavior or rd 
qualify as an analytical approach to the measurement of a ity 
Some techniques of measurement provide more insight than others, 
but this is not the crucial point in this discussion. r 
objectives of the analytical approach are, of course, increases in the 
degrees of objectivity, reliability, and validity which characterize 
the resultant indices. But in addition to these, it permits an examina- 
tion of the responses or items leading to the over-all indices. In 
this way it is thought some insight will be gained. And this insight 
would not have come about with the use of a nonanalytical ap- 
proach. Analytical approaches, too, are Wsually more objective, more 
reliable, and more valid than nonanalytical approaches. Also, the 
approach permits more careful analyses designed to Improve objec- 
tivity, reliability, and validity. , 

Perceptual Approach. A perceptual approach is one that at- 
tempts to derive implications relative to personality from our vari- 
ous sense perceptions. These perceptions can be visual or auditory, or 
they can be in any of the other sense modalities, So far, the method 
has been exploited only in the visual field. The theory is that the 
various things we see or hear may provide clues to personality 
structure. They do this, if they do it at all, because different indi- 
viduals see different things in the same objective and external 
stimulus. And it is thought that the fact that our personality struc- 
tures differ is one of the reasons for our seeing these different things. 
This approach holds much appeal because it seems possible that 
the personality nuances involved in differing perceptions to the 
same visual stimulus would not be apparent to the naïve subject, 
And being naive to the nuances involved, the subject would not be 
able to slant his responses to achieve any desired outcome in the 
way he could in most of the other approaches we have discussed, 

This theory can be questioned. The presumably naïve subject can 
read about the Rorschach Ink Blot Test just as easily as he can 
read something about the Terman-Miles M-F test, and as Soon ag 
he does so, he is no longer naive with respect to the interpretation 
which will be put upon his responses. The naiveté comes directly 
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from the Rorschach cult itself in its crude and shallow analogies. 
If the Rorschach test has merit, very little of this merit has been 
demonstrated. And it has not been demonstrated because most 
Rorschachers back away from a rigid validation procedure. The 
test is not amenable to such treatment, they say. But there is no 
reason why the implications drawn from the Rorschach responses 
should not be subject to the same rigid validation procedures that 
we require of any other technique of personality measurement. 

Imaginal Approach. An imaginal approach to the measurement 
of personality makes use of our imaginal concepts in the same way 
as a perceptual approach makes use of our percepts for the derivation 
of insights into our personality. The chief difficulty with the method 
which we chose to illustrate, the Thematic Apperception Test, is its 
lack of objectivity. This permits or leads to low degrees of reliability 
and, of course, makes validation difficult. There is apparently 
something more to the Thematic Apperception Test than there is to 
the Rorschach Ink Blot Test, and we see in it considerably more 
sophistication than we see in the Rorschach camp. The method 
requires such highly and intensively trained examiners, however, 
that the technique is likely not to be applied widely. Of course, if 
we had somewhat greater demonstration of its practical utility in 
contrast with the utility of other methods of personality measure- 
ment, this would provide a degree of motivation for a sufficient 
number of psychologists to learn the technique and thus would give 
it more extensive application than it now enjoys. We must realize, 
however, that of all the techniques of measurement discussed in this 
volume, the validation of the Thematic Apperception Test presents 
the most difficult problem. It presents a difficult problem because, 
as we pointed out in Chap. 13, the only adequate method of valida- 
tion is against the findings of a complete psychoanalysis. This takes 
much time, and probably no one is going to want to spend the 
necessary amount of time required just to provide validation data 
for the Thematic Apperception Test. And, of course, we might 
question the validity of the psychoanalytic findings also. Psycho- 
analysis, at best, is a highly subjective procedure and is one which 
permits any one of many rationalizations to appear as a plausible 
explanation of a person's behavior. We have had ample evidence 
that psychoanalysts do not agree among themselves on the implica- 
tions of any given series of events or episodes of behavior. 
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Observational Approach. By an observational approach we mean 
one that utilizes performance as a basis for the deriv t 
ences with respect to personality. The performance which serves n 
a basis for the inference is not controlled, however, or forced into 
any particular channel. As far as the subject is concerned, 
behavior or performance is spontaneous or, at | 
aware, subject to his own control. Ther 
upon him which will cause him to exhibj manc 
rather than another, but these forces arise from the situation in 
which the performance takes place rather than from any manipula- 
tion imposed by the investigator. Thus, in Barker, Lewin, and 
Dembo’s study, the child played in one situation or in another. 
The investigators did not ask the child to exhibit one type of be- 
havior rather than another. Nor did they subject the child to any 
manipulation. The child was free to do what he wished. Barker, 
Lewin, and Dembo merely recorded what happened. We found in 
this study that careful observation can lead to significant inferences 
for personality. Those made by Barker, Lewin, and Dembo were 
demonstrably reliable and were presumably valid. But the tech- 
nique is cumbersome for general use, and cannot readily be applied 
to a large number of individuals in any rigidly systematic manner. 
Therefore, the observational technique is one best used to test a 
theory, to set up a hypothesis, and so forth. It cannot be picked up 
and carried around like a Kuder Preference Record and applied 
to one or to many individuals as the situation may require. It is 
possible, however, that ultimately the implications derivable from 
carefully controlled observational approaches can be incorporated 
into one of the other approaches that can be applied more 
to many individuals. 
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Imagine how cumbersome and time-consuming it would be, for 
example, to run 1,000 subjects through the procedures involved in 
Sears’s motility study. Therefore, the experimental method, like the 
observational method, will continue to exhibit its greatest utility 
in the exploration, setting up and derivation, and testing of hypoth- 
eses. Those that seem worthy of being applied, in a measuring 
sense, to many individuals will have to be incorporated somehow 
in one of the other approaches. It is too bad that the approach 
having in it the best of all possible controls cannot be easily applied 
to many subjects. It seems, however, to represent one extreme of a 
continuum. For when we seek methods of wide and easy applica- 
bility to many subjects, we find almost no control in the sense 
implied by the experimental approach. 

It is possible that we should be more patient and do a lot more 
work of a rigid experimental nature before we continue with some 
of the other approaches. But we are impatient and anxious to find 
ways of getting at the personalities of a lot of people, not of just a 
few. So we shall probably continue our varying approaches, some- 
times advancing on this front and sometimes advancing on that. But 
we must admit we are still a long way from having in our possession 
the kind of personality-measuring instrument that can give us 
information comparable in reliability and validity to that now 
provided by the better mental-alertness, intelligence, and achieve- 
ment tests. 


PERSONALITY VARIABLES 


We have discussed in this volume only seven measurement rubrics. 
These are interest, attitude, personality, adjustment, rating, projec- 
tion, and performance. And under each of these rubrics we have 
examined one or more test-construction methodologies. It may seem, 
therefore, that our coverage of the vast complex field of personality 
measurement has not been very extensive. However, under these 
seven measurement rubrics we have discussed over 250 identifiable 
personality traits. So our coverage of the complex field of personality 
measurement has not been so limited as it might at first have seemed. 

Unfortunately, the methods of measurement are not comparable 
to each other, so we must doubt the value of certain of the techniques 
while accepting the value of others. Another disturbing thing is the 
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lack of correlation between two different instruments designed to 
measure the same trait. We have not commented on this x any 
great extent in the previous chapters, but there is abundant m re 
in the literature to suggest that for a guide to the comparabi ity 
of any two measures, we ought to rely more upon the technique of 
measurement employed than upon the trait names used to identify 
the scales. : aah 

The fact that we have touched upon more than 250 identifi 
personality traits raises the question as to how many traits there 
really are. The answer to this depends upon one’s point of view. If 
we are going to be extremely fastidious, the answer probably is an 
infinite number. But to give such an answer gets us nowhere. There- 
fore, Allport and Odbert, and more recently, Cattell, have attacked 
this problem and have tried to identify traits of some presumed 
psychological importance. We shall not discuss their studies, how- 
ever, as the interested reader can go directly to them. 

But no matter what the exact answer, if we have to deal with 
hundreds of traits to understand the human personality, we shall 
find our task a long, hard, and burdensome one. It is understandable, 
therefore, that much effort is being devoted to a search for some 
small and convenient number of traits of major importance which 
can serve as outstanding landmarks for our describing, under- 
standing, predicting, and controlling human behavior. It may be a 
false hope, but the fact that we can describe any point in space in 
terms of only three dimensions (north-south, east-west, and up- 
down) gives us the model we seek in the field of personality. We 
shall not mind settling for 10 or 15 variables, but to have to deal 
with 250 or more makes the problem exceedingly complex. 

The purpose of this text was to explain several of the major 
methodologies used in personality-test construction. If the student 
thoroughly understands these methodologies, he will be in a position 
to appreciate the differences in findings reported by different investi- 
gators. He will be able to see why certain results should be given 
credence and why others should be discounted. And it may be that 
some student, after familiarizing himself with the methodologies 
we have discussed, will be able to derive new methodologies and 
will be able to accomplish, more effectively, those objectives set 
forth at the beginning of this volume. 
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